Closed shyyhs closed 2 years ago
Read 30M words Number of words: 136226 Number of labels: 0 Progress: 14.5% words/sec/thread: 21063 lr: 0.171061 loss: 2.602286 eta: 0h9m
$FASTTEXT \ sent2vec -input $INPUT_FILE -output $OUTPUT_FILE \ -wordNgrams 2 \ -dim 768 \ -minCountLabel 20 \ -minCount 8 \ -dropoutK 4 \ -loss ns \ -neg 10 \ -lr 0.2 \ -epoch 9 \ -t 0.000005 \ -neg 10 \ -thread 20 \ -numCheckPoints 1 \ -bucket 4000000 \ -bucketChar 2000000 \
Did I miss something?
I used several corpora that all contain 30M words.
When I use the command to train the model, no matter how large the corpus is, the code only reads 30M words like this:
Read 30M words Number of words: 136226 Number of labels: 0 Progress: 14.5% words/sec/thread: 21063 lr: 0.171061 loss: 2.602286 eta: 0h9m
Here is the training command I use:
$FASTTEXT \ sent2vec -input $INPUT_FILE -output $OUTPUT_FILE \ -wordNgrams 2 \ -dim 768 \ -minCountLabel 20 \ -minCount 8 \ -dropoutK 4 \ -loss ns \ -neg 10 \ -lr 0.2 \ -epoch 9 \ -t 0.000005 \ -neg 10 \ -thread 20 \ -numCheckPoints 1 \ -bucket 4000000 \ -bucketChar 2000000 \
Did I miss something?