OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.75k stars 2.25k forks source link

can't save vocab, pickling error #1165

Closed Dabuk closed 5 years ago

Dabuk commented 5 years ago

I am experiencing the following error when I run the preprocessor.py function

python preprocess.py -train_src data/morph/src.train -train_tgt data/morph/tgt.train -valid_src data/morph/src.valid -valid_tgt data/morph/tgt.valid -save_data data/morph

[2019-01-09 20:31:10,776 INFO] Extracting features... [2019-01-09 20:31:10,778 INFO] number of source features: 0. [2019-01-09 20:31:10,778 INFO] number of target features: 0. [2019-01-09 20:31:10,778 INFO] Building Fields object... [2019-01-09 20:31:10,778 INFO] Building & saving training data... [2019-01-09 20:31:10,778 INFO] Reading source and target files: data/morph/src.train data/morph/tgt.train. [2019-01-09 20:31:10,802 INFO] Building shard 0. [2019-01-09 20:31:10,859 INFO] saving 0th train data shard to data/amr.train.0.pt. [2019-01-09 20:31:10,970 INFO] Building & saving validation data... [2019-01-09 20:31:10,970 INFO] Reading source and target files: data/morph/src.valid data/morph/tgt.valid. [2019-01-09 20:31:10,972 INFO] Building shard 0. [2019-01-09 20:31:10,973 INFO] saving 0th valid data shard to data/amr.valid.0.pt. [2019-01-09 20:31:10,992 INFO] Building & saving vocabulary... [2019-01-09 20:31:11,009 INFO] reloading data/amr.train.0.pt. [2019-01-09 20:31:11,045 INFO] tgt vocab size: 31. [2019-01-09 20:31:11,046 INFO] * src vocab size: 69. Traceback (most recent call last): File "preprocess.py", line 177, in main() File "preprocess.py", line 173, in main build_save_vocab(train_dataset_files, fields, opt) File "preprocess.py", line 118, in build_save_vocab torch.save(fields, vocab_path) File "/home/dhanush/Software/miniconda2/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 218, in save return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol)) File "/home/dhanush/Software/miniconda2/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 143, in _with_file_like return body(f) File "/home/dhanush/Software/miniconda2/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 218, in return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol)) File "/home/dhanush/Software/miniconda2/envs/py36/lib/python3.6/site-packages/torch/serialization.py", line 291, in _save pickler.dump(obj) _pickle.PicklingError: Can't pickle <function Field. at 0x7f23213f5c80>: attribute lookup Field. on torchtext.data.field failed

I experienced the same error on my data aswell.

Any reason why this is happening ?

I am using python 3.6 and torch 1.0.0

vince62s commented 5 years ago

what is your torchtext version ?

Dabuk commented 5 years ago

0.3.1

vince62s commented 5 years ago

did you try to install from git+https://github.com/pytorch/text let us know if it fixes it. @bpopeters not sure about the wersion that allowed to pickle fields.

bpopeters commented 5 years ago

I'm not sure exactly which version fixed it, but I have torchtext 0.4.0 and fields can be saved. If I recall, in previous versions of torchtext the default tokenizer for fields was a lambda expression for some reason and that was the only reason it couldn't be pickled. tl;dr try torchtext 0.4.0

Digression: @vince62s this reminds me, could we change the name of the `morph' dataset? It originates in some Serbo-Croatian grapheme-to-phoneme conversion experiments I did a couple years ago and somehow got labeled as morphology even though it isn't.

vince62s commented 5 years ago

sure, I just dont' have access to hosted datasets (@ harvard NLP) but this name should not be in there.

Dabuk commented 5 years ago

solved the issue. Thank you. @vince62s your suggestion helped @bpopeters yes, I had a lower version and hence the issue.

eduamf commented 5 years ago

Same error after update the onmt modules.

File "/opt/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/serialization.py", line 292, in _save pickler.dump(obj) _pickle.PicklingError: Can't pickle <function Field. at 0x7f24904688c8>: attribute lookup Field. on torchtext.data.field failed

I was using PyTorch 0.4.1. Than, I verified that the requirements.txt changed too. I had to remove the old version before to install PyTorch 1.0.0. However, the error still occurs when preprocess.py tries to save the vocab.

I think that threre is another problem when using ■ (joiner), but is better to take one thing at time.

guotong1988 commented 5 years ago

Same question. Thank you!