if VOCAB_SIZE is not constant and fixed,how can i use model.add_lookup_parameters((VOCAB_SIZE, EMB_DIM))

clab / dynet

DyNet: The Dynamic Neural Network Toolkit

Apache License 2.0

3.42k stars 704 forks source link

if VOCAB_SIZE is not constant and fixed,how can i use model.add_lookup_parameters((VOCAB_SIZE, EMB_DIM)) #1498

Open fdujuan opened 5 years ago

fdujuan commented 5 years ago

I can't use V = model.add_lookup_parameters((VOCAB_SIZE, EMB_DIM)),because my VOCAB_SIZE is not a constant , i have to deal with out-of-vocabulary(oov) words for every article-title pair. Is there any other method for not fixed VOCAB_SIZE? Thank you，English is not my native language

FilippoC commented 5 years ago

Can you estimate an upper bound on the size of the vocab and reserve this quantity at the beginning?

fdujuan commented 5 years ago

Can you estimate an upper bound on the size of the vocab and reserve this quantity at the beginning?

I have tried this method,but something wrong when i use test data. Maybe i mistake somewhere and i will try again.Thank you.

FilippoC commented 5 years ago

do not forget to fix the dictionnary at test time (so you still need an unknow word representation for word outside the training set)
words that a rarelly seen during training will not have been to much updated, meaning their representation may be wrong.

There are plenty of better solution, starting with pre-trained word embeddings (glove or polygrlot for example).