ericxsun / fastText

Library for fast text representation and classification.
Other
15 stars 9 forks source link

Load dict from trained model #2

Closed yohanguez closed 6 years ago

yohanguez commented 6 years ago

Hi,

Being able to finetune a pre-trained model is a anazing tool. However, I don't know why but when I am starting to run my command './fasttext supervised -input -inputModel -output -thread 25 -incr', it says 'Load dict from trained model' and never go to the next step (I waited more than 1 hour).

Is the syntax correct ? What am I missing ?

Thanks Yohan

AritzBi commented 6 years ago

I don't know why but It takes very long to load model's '.bin' as dictionary. I've trying to load the english pre-trained word vectors and after a day, it hasn't finished the "Load dict from trained model" step.

benman1 commented 6 years ago

I am also trying with ./fasttext [...] -inputModel myModel.bin -inc. I am getting this here:

Update args
Load dict from trained model
Load dict from training data
Read 132M words
Number of words:  3919267
Number of labels: 0
Merge dict
fasttext: src/dictionary.cc:139: std::__cxx11::string fasttext::Dictionary::getWord(int32_t) const: Assertion `id < size_' failed.
ericxsun commented 6 years ago

@yohanguez @AritzBi @benman1 actually, with the following, no error occurred.

  1. train first model: ./fasttext supervised -input training_data -output supervised-model

supervised-model.bin and supervised-model.vec were generated.

  1. incremental training with new data: new_training_data ./fasttext supervised -input new_training_data -inputModel supervised-model.bin -output re-supervised-model -incr

training started and then, re-supervised-model.bin and re-supervised-model.vec were generated.

same as for unsupervised, like 'cbow'.

could you debug it step by step, appreciation for your any efforts. Also, I'll exam the code.

ericxsun commented 6 years ago

do training/incr-training with word-vector-example.sh, no errors. Please retry with the latest code. THKS