Open Lyaaaaaaaaaaaaaaa opened 2 years ago
Could you try this script: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/convert_to_pytorch.py
Hello, I will try this one and update you.
Hello, sorry for the long delay.
I ran your script and got another error. TypeError: expected str, bytes or os.PathLike object, not NoneType
python3 model_converter/convert_to_pytorch.py --model-path opus-en-pt --dest-path converted/opus-en-pt
added 1 tokens to vocab
Traceback (most recent call last):
File "/home/path_to_project/model_converter/convert_to_pytorch.py", line 28, in <module>
convert(Path(args.model_path), Path(args.dest_path))
File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 663, in convert
opus_state = OpusState(source_dir)
File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 494, in __init__
self.tokenizer = self.load_tokenizer()
File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/convert_marian_to_pytorch.py", line 593, in load_tokenizer
return MarianTokenizer.from_pretrained(str(self.source_dir))
File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
return cls._from_pretrained(
File "/home/path_to_env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/path_to_env/lib/python3.9/site-packages/transformers/models/marian/tokenization_marian.py", line 158, in __init__
assert Path(source_spm).exists(), f"cannot find spm source {source_spm}"
File "/home/path_to_env/lib/python3.9/pathlib.py", line 1082, in __new__
self = cls._from_parts(args, init=False)
File "/home/path_to_env/lib/python3.9/pathlib.py", line 707, in _from_parts
drv, root, parts = self._parse_args(args)
File "/home/path_to_env/lib/python3.9/pathlib.py", line 691, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
accelerate 0.18.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.8.4 py39h72bdee0_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
arrow-cpp 11.0.0 ha770c72_13_cpu conda-forge
async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
attrs 22.2.0 pyh71513ae_0 conda-forge
aws-c-auth 0.6.26 hf365957_1 conda-forge
aws-c-cal 0.5.21 h48707d8_2 conda-forge
aws-c-common 0.8.14 h0b41bf4_0 conda-forge
aws-c-compression 0.2.16 h03acc5a_5 conda-forge
aws-c-event-stream 0.2.20 h00877a2_4 conda-forge
aws-c-http 0.7.6 hf342b9f_0 conda-forge
aws-c-io 0.13.19 h5b20300_3 conda-forge
aws-c-mqtt 0.8.6 hc4349f7_12 conda-forge
aws-c-s3 0.2.7 h909e904_1 conda-forge
aws-c-sdkutils 0.1.8 h03acc5a_0 conda-forge
aws-checksums 0.1.14 h03acc5a_5 conda-forge
aws-crt-cpp 0.19.8 hf7fbfca_12 conda-forge
aws-sdk-cpp 1.10.57 h17c43bd_8 conda-forge
brotlipy 0.7.0 py39hb9d737c_1005 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
certifi 2022.12.7 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py39he91dace_3 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 8.1.3 unix_pyhd8ed1ab_2 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
cryptography 40.0.1 py39h079d5ae_0 conda-forge
cudatoolkit 11.8.0 h37601d7_11 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge
dataclasses 0.8 pyhc8e2a94_3 conda-forge
datasets 2.11.0 pyhd8ed1ab_0 conda-forge
dill 0.3.6 pyhd8ed1ab_1 conda-forge
filelock 3.10.7 pyhd8ed1ab_0 conda-forge
frozenlist 1.3.3 py39hb9d737c_0 conda-forge
fsspec 2023.3.0 pyhd8ed1ab_1 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
huggingface_hub 0.13.3 pyhd8ed1ab_0 conda-forge
icu 72.1 hcb278e6_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
importlib-metadata 6.1.0 pyha770c72_0 conda-forge
importlib_metadata 6.1.0 hd8ed1ab_0 conda-forge
joblib 1.2.0 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.20.1 h81ceb04_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libabseil 20230125.0 cxx17_hcb278e6_1 conda-forge
libarrow 11.0.0 h93537a5_13_cpu conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h166bdaf_8 conda-forge
libbrotlidec 1.0.9 h166bdaf_8 conda-forge
libbrotlienc 1.0.9 h166bdaf_8 conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcurl 7.88.1 hdc1c0ab_1 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h28343ad_4 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libgoogle-cloud 2.8.0 h0bc5f78_1 conda-forge
libgrpc 1.52.1 hcf146ea_1 conda-forge
libhwloc 2.9.0 hd6dc26d_0 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
libnghttp2 1.52.0 h61bc06f_0 conda-forge
libnuma 2.0.16 h0b41bf4_1 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libprotobuf 3.21.12 h3eb15da_0 conda-forge
libsentencepiece 0.1.97 h47aad16_1 conda-forge
libsqlite 3.40.0 h753d276_0 conda-forge
libssh2 1.10.0 hf14f497_3 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libthrift 0.18.1 h5e4af38_0 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libxml2 2.10.3 hfdac1af_6 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
llvm-openmp 16.0.0 h417c0b6_0 conda-forge
lz4-c 1.9.4 hcb278e6_0 conda-forge
magma 2.6.2 hc72dce7_0 conda-forge
mkl 2022.2.1 h84fe81f_16997 conda-forge
multidict 6.0.4 py39h72bdee0_0 conda-forge
multiprocess 0.70.14 py39hb9d737c_3 conda-forge
nccl 2.14.3.1 h0800d71_0 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
ninja 1.11.1 h924138e_0 conda-forge
numpy 1.24.2 py39h7360e5f_0 conda-forge
openssl 3.1.0 h0b41bf4_0 conda-forge
orc 1.8.3 hfdbbad2_0 conda-forge
packaging 23.0 pyhd8ed1ab_0 conda-forge
pandas 1.5.3 py39h2ad29b5_1 conda-forge
parquet-cpp 1.5.1 2 conda-forge
pip 23.0.1 pyhd8ed1ab_0 conda-forge
psutil 5.9.4 py39hb9d737c_0 conda-forge
pyarrow 11.0.0 py39hf0ef2fd_13_cpu conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyopenssl 23.1.1 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.9.7 hf930737_3_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-xxhash 3.2.0 py39h72bdee0_0 conda-forge
python_abi 3.9 3_cp39 conda-forge
pytorch 1.13.1 cuda112py39hb0b7ed5_200 conda-forge
pytz 2023.3 pyhd8ed1ab_0 conda-forge
pyyaml 6.0 py39hb9d737c_5 conda-forge
re2 2023.02.02 hcb278e6_0 conda-forge
readline 8.2 h8228510_1 conda-forge
regex 2023.3.23 py39h72bdee0_0 conda-forge
requests 2.28.2 pyhd8ed1ab_1 conda-forge
responses 0.18.0 pyhd8ed1ab_0 conda-forge
s2n 1.3.41 h3358134_0 conda-forge
sacremoses 0.0.53 pyhd8ed1ab_0 conda-forge
sentencepiece 0.1.97 hf3d152e_1 conda-forge
sentencepiece-python 0.1.97 py39h0fce851_1 conda-forge
sentencepiece-spm 0.1.97 h47aad16_1 conda-forge
setuptools 67.6.1 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleef 3.5.1 h9b69904_2 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
sqlite 3.40.0 h4ff8645_0 conda-forge
tbb 2021.8.0 hf52228f_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tokenizers 0.13.2 py39h585fa2d_0 conda-forge
tqdm 4.65.0 pyhd8ed1ab_1 conda-forge
transformers 4.27.4 pyhd8ed1ab_0 conda-forge
typing-extensions 4.5.0 hd8ed1ab_0 conda-forge
typing_extensions 4.5.0 pyha770c72_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
ucx 1.14.0 h538f049_0 conda-forge
urllib3 1.26.15 pyhd8ed1ab_0 conda-forge
websockets 10.4 py39hb9d737c_1 conda-forge
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xxhash 0.8.1 h0b41bf4_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.8.2 py39hb9d737c_0 conda-forge
zipp 3.15.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 h166bdaf_4 conda-forge
zstd 1.5.2 h3eb15da_6 conda-forge
Did you download the model that you want to convert? The script expects the model in the model path you specify on command-line. Maybe this makefile helps you to see how I use the script for converting models: https://github.com/Helsinki-NLP/Opus-MT/blob/master/hf/Makefile
Hello, yes I downloaded the model I want to convert, Opus-en-pt. I believe I downloaded the good format, here is the list of files present in the opus-en-pt folder. Just in case
decoder.yml
opus.bpe32k-bpe32k.transformer.model1.npz.best-perplexity.npz
opus.bpe32k-bpe32k.transformer.valid1.log
postprocess.sh
README.md
source.tcmodel
tokenizer_config.json
LICENSE
opus.bpe32k-bpe32k.transformer.train1.log
opus.bpe32k-bpe32k.vocab.yml
preprocess.sh
source.bpe
target.bpe
vocab.json
I have difficulties to understand the makefile.
Hello, I'm trying to convert more models to the pytorch format, but I'm getting an error.
I'm running the convert_marian_tatoeba_to_pytorch script, but it seems like it's looking for a readme.md file in the models/results folder, yet there is none.