OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.76k stars 2.25k forks source link

AttributeError: 'TextMultiField' object has no attribute 'vocab' #1249

Closed cocoxu closed 5 years ago

cocoxu commented 5 years ago

I preprocessed the data by this command: python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

then tried to load the Glove embeddings by this command: python ./tools/embeddings_to_torch.py -emb_file_enc "glove_dir/glove.6B.100d.txt" -emb_file_dec "glove_dir/glove.6B.100d.txt" -dict_file "data/demo.vocab.pt" -output_file "data/demo_gloveembeddings"

but, got the following error:

Traceback (most recent call last):
  File "./tools/embeddings_to_torch.py", line 125, in <module>
    main()
  File "./tools/embeddings_to_torch.py", line 83, in main
    enc_vocab, dec_vocab = get_vocabs(opt.dict_file)
  File "./tools/embeddings_to_torch.py", line 20, in get_vocabs
    enc_vocab = fields['src'][0][1].vocab
AttributeError: 'TextMultiField' object has no attribute 'vocab'

Did I miss something? or is it due to the compatibility of vocab files between the current version of preprocessing.py and embeddings_to_torch.py?

Looked a bit more into this ... it looks like at some point the onmt.inputters.text_dataset.TextMultiField class has changed to remove the "vocab" attribute, but only have "fields" attribute now.

import torch fields = torch.load("data/demo.vocab.pt") print (fields['src'][0][1]) <onmt.inputters.text_dataset.TextMultiField object at 0x7fb527440860> print (fields['src'][0][1].fields[0][1].vocab) <torchtext.vocab.Vocab object at 0x7fb4c778f7f0>

vince62s commented 5 years ago

Probably, yes, @flauted can yo please have a look ?

flauted commented 5 years ago

Yeah embeddings_to_torch.py never got updated after #1216. Thanks for the details. I'll open a PR shortly.

cocoxu commented 5 years ago

Thanks! It works now.

I submitted a small patch (cocoxu:patch-1) to update the command line example in 'OpenNMT-py/docs/source/FAQ.md' for './tools/embeddings_to_torch.py'. (not sure if this is the right way to submit PR -- will read the guideline).

vince62s commented 5 years ago

no try to google on how to send a PR not very difficult but need some specific steps.

HossamAmer12 commented 3 years ago

I have an input vocab file (vocab.txt) and would like to load it in

Does anyone know how to do this?