Archive embedding size - Githubissues

bjherger / keras-pandas

keras-pandas allows users to rapidly build and iterate on deep learning models.

MIT License

57 stars 14 forks source link

Archive embedding size #105

Open bjherger opened 5 years ago

bjherger commented 5 years ago

Archive vocab size, for use when creating embedding layer. Transient issue can occur where the maximum vocab index isn't seen in the training data set, and so the embedding vectorizer has a larger vocab than the embedding matrix.

Current state

Embedding matrix pulls vocab size based on largest vocab index seen in training set

Future state

Embedding matrix pulls vocab size from either transformation pipeline or somewhere else that is explicitly set by transformation pipeline