MorinoseiMorizo / jparacrawl-finetune

An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.
http://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/
103 stars 8 forks source link

How to load finetuned model? #3

Closed t-qureshi closed 3 years ago

t-qureshi commented 3 years ago

@MorinoseiMorizo Hye, Thanks for sharing the steps to fine tune the jparacrawl model. I followed your steps and get model. But i unable to load this model for inference.

from fairseq.models.transformer import TransformerModel ja2en = TransformerModel.from_pretrained('./', checkpoint_file='checkpoint_bes.pt',data_name_or_path='./',bpe='sentencepiece',sentencepiece_model='./spm.ja.nopretok.model')

Traceback (most recent call last): File "", line 1, in File "/home/talha/Pictures/translation/jparacrawl-finetune/fairseq/fairseq/models/fairseq_model.py", line 279, in from_pretrained return hub_utils.GeneratorHubInterface(x["args"], x["task"], x["models"]) File "/home/talha/Pictures/translation/jparacrawl-finetune/fairseq/fairseq/hub_utils.py", line 106, in init self.bpe = encoders.build_bpe(cfg.bpe) File "/home/talha/Pictures/translation/jparacrawl-finetune/fairseq/fairseq/registry.py", line 52, in build_x return builder(cfg, *extra_args, **extra_kwargs) File "/home/talha/Pictures/translation/jparacrawl-finetune/fairseq/fairseq/data/encoders/sentencepiece_bpe.py", line 23, in init sentencepiece_model = file_utils.cached_path(cfg.sentencepiece_model) File "/home/talha/Pictures/translation/jparacrawl-finetune/fairseq/fairseq/file_utils.py", line 166, in cached_path raise EnvironmentError("file {} not found".format(url_or_filename)) OSError: file ??? not found

MorinoseiMorizo commented 3 years ago

Hi,

Thank you for trying.

It looks like fairseq cannot find the specified file. Could you make sure that all paths are correct? I suspect two things. 1) 'checkpoint_bes.pt' has typo. (misses t) 2) data_name_or_path might be incorrect. Could you paste the result of ls ./?

t-qureshi commented 3 years ago

Thanks for quick reply @MorinoseiMorizo , here is the list of all files in current directory

ls checkpoint_bes.pt dict.en.txt dict.ja.txt spm.en.nopretok.model spm.en.nopretok.vocab spm.ja.nopretok.model spm.ja.nopretok.vocab test_ja_model.model Screenshot from 2020-11-05 11-10-17

MorinoseiMorizo commented 3 years ago

I think your files are properly located. I'm not sure why it doesn't work.

Have you tried the provided decode.sh script? At least I checked it works, and it would be much easier to try. https://github.com/MorinoseiMorizo/jparacrawl-finetune/blob/master/ja-en/decode.sh