Closed cocoxu closed 5 years ago
Probably, yes, @flauted can yo please have a look ?
Yeah embeddings_to_torch.py
never got updated after #1216. Thanks for the details. I'll open a PR shortly.
Thanks! It works now.
I submitted a small patch (cocoxu:patch-1) to update the command line example in 'OpenNMT-py/docs/source/FAQ.md' for './tools/embeddings_to_torch.py'. (not sure if this is the right way to submit PR -- will read the guideline).
no try to google on how to send a PR not very difficult but need some specific steps.
I have an input vocab file (vocab.txt) and would like to load it in
Does anyone know how to do this?
I preprocessed the data by this command:
python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
then tried to load the Glove embeddings by this command:
python ./tools/embeddings_to_torch.py -emb_file_enc "glove_dir/glove.6B.100d.txt" -emb_file_dec "glove_dir/glove.6B.100d.txt" -dict_file "data/demo.vocab.pt" -output_file "data/demo_gloveembeddings"
but, got the following error:
Did I miss something? or is it due to the compatibility of vocab files between the current version of preprocessing.py and embeddings_to_torch.py?
Looked a bit more into this ... it looks like at some point the onmt.inputters.text_dataset.TextMultiField class has changed to remove the "vocab" attribute, but only have "fields" attribute now.