Helsinki-NLP / Tatoeba-Challenge

Other
808 stars 91 forks source link

convert_marian_tatoeba_to_pytorch FileNotFoundError #28

Open Lyaaaaaaaaaaaaaaa opened 2 years ago

Lyaaaaaaaaaaaaaaa commented 2 years ago

Hello, I'm trying to convert more models to the pytorch format, but I'm getting an error.

I'm running the convert_marian_tatoeba_to_pytorch script, but it seems like it's looking for a readme.md file in the models/results folder, yet there is none.

Traceback (most recent call last):
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 1282, in <module>
    resolver = TatoebaConverter(save_dir=args.save_dir)
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 58, in __init__
    reg = self.make_tatoeba_registry()
  File "Tatoeba-Challenge/scripts/convert_marian_tatoeba_to_pytorch.py", line 264, in make_tatoeba_registry
    lns = list(open(p / "README.md").readlines())

FileNotFoundError: [Errno 2] No such file or directory: 'Tatoeba-Challenge/models/results/README.md'
jorgtied commented 1 year ago

Could you try this script: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/convert_to_pytorch.py

Lyaaaaaaaaaaaaaaa commented 1 year ago

Hello, I will try this one and update you.

Lyaaaaaaaaaaaaaaa commented 1 year ago

Hello, sorry for the long delay. I ran your script and got another error. TypeError: expected str, bytes or os.PathLike object, not NoneType

The logs:

python3 model_converter/convert_to_pytorch.py --model-path opus-en-pt --dest-path converted/opus-en-pt

added 1 tokens to vocab
Traceback (most recent call last):
  File "/home/path_to_project/model_converter/convert_to_pytorch.py", line 28, in <module>
    convert(Path(args.model_path), Path(args.dest_path))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 663, in convert
    opus_state = OpusState(source_dir)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 494, in __init__
    self.tokenizer = self.load_tokenizer()
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 593, in load_tokenizer
    return MarianTokenizer.from_pretrained(str(self.source_dir))
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
    return cls._from_pretrained(
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py", line 158, in __init__
    assert Path(source_spm).exists(), f"cannot find spm source {source_spm}"
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 1082, in __new__
    self = cls._from_parts(args, init=False)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 707, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/home/path_to_env/lib/python3.9/pathlib.py", line 691, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Additional information:

jorgtied commented 1 year ago

Did you download the model that you want to convert? The script expects the model in the model path you specify on command-line. Maybe this makefile helps you to see how I use the script for converting models: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/Makefile

Lyaaaaaaaaaaaaaaa commented 1 year ago

Hello, yes I downloaded the model I want to convert, Opus-en-pt. I believe I downloaded the good format, here is the list of files present in the opus-en-pt folder. Just in case

decoder.yml
opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz
opus.bpe32k-bpe32k.transformer.valid1.log
postprocess.sh
README.md
source.tcmodel
tokenizer_config.json
LICENSE
opus.bpe32k-bpe32k.transformer.train1.log
opus.bpe32k-bpe32k.vocab.yml
preprocess.sh
source.bpe
target.bpe
vocab.json

I have difficulties to understand the makefile.