Closed su0315 closed 1 year ago
Hi, Thank you for trying our examples.
Which version of JParaCrawl pre-trained model are you using?
We are now providing both 1.0 and 3.0.
I'm very sorry that I totally forgot to write it on README but we trained the 3.0 models on a different version of fairseq.
If you are using 3.0, then you should use the following version of fairseq ce961a9fd26aef5130720cb6a171ddd5b51a8961
.
Another considerable reason why you receive the error
[Exception: Cannot load model parameters from checkpoint /home/sumire/main/NMT_models/jparacrawl/en-ja/small_en-ja/checkpoint_best.pt; please ensure that the architectures match.]
is you are trying to train a small model while you are specifying the model architecture --arch transformer
.
If you want to fine-tune a small model, the parameter should be --arch transformer_iwslt_de_en
.
I'm happy to help you if you need any further assistance. Thank you.
Hi, Thank you for checking this and sorry for the late reply! I am using the version 3.0 of JParaCrawl pre-trained model from this link (https://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/).
Good news, I tried the fairseq version and the transformer architecture that you specified, and the error solved!
Now, I''m just getting a RuntimeError, which is RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:121, unhandled cuda error, NCCL version 2.14.3 ncclUnhandledCudaError: Call to CUDA function failed. Last error: Cuda failure 'out of memory'
I will look into it. Thanks a lot!
That was good news! For the out-of-memory error, one possible solution is to reduce the max-token to the size that fits into your GPU memory. https://github.com/MorinoseiMorizo/jparacrawl-finetune/blob/master/en-ja/fine-tune_kftt_fp32.sh#L83
Hope it works.
Hi! Thanks for publishing the example usage, it helps me a lot.
I am finetuning JParaCrawl base model on Business Scene Dialogue Corpus https://github.com/tsuruoka-lab/BSD. I coded the same parameter as your repo's fine-tuning .sh file (fine-tune_kftt_fp32.sh) However, it gives the error on the title above.
Here is my code on the sh file.
Error Message: AttributeError: 'NoneType' object has no attribute 'task'
Environment
Additional context
After this bug, I also tried the latest fairseq version (0.12.2) with pip install, and then replaced
python3 $FAIRSEQ/train.py
withfairseq-train
following the fairseq documentation (https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-train) too.Then it showed different error: _[Exception: Cannot load model parameters from checkpoint /home/sumire/main/NMT_models/jparacrawl/en-ja/small_en-ja/checkpointbest.pt; please ensure that the architectures match.]
So I would like to know how to debug those 2 errors on each different environments. [AttributeError: 'NoneType' object has no attribute 'task'] _[Exception: Cannot load model parameters from checkpoint /home/sumire/main/NMT_models/jparacrawl/en-ja/small_en-ja/checkpointbest.pt; please ensure that the architectures match.]
Thanks in advance!