facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.8k stars 1.05k forks source link

Have anyone face this problem when finetune #462

Open xufeiqiong opened 4 months ago

xufeiqiong commented 4 months ago

when I run finetune,it tell me: Traceback (most recent call last): File "/opt/conda/bin/m4t_finetune", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/seamless_communication/cli/m4t/finetune/finetune.py", line 148, in main text_tokenizer = load_unity_text_tokenizer(args.model_name) File "/opt/conda/lib/python3.10/site-packages/fairseq2/models/utils/generic_loaders.py", line 353, in call return self._load(path, card) File "/opt/conda/lib/python3.10/site-packages/fairseq2/models/nllb/loader.py", line 88, in _load return NllbTokenizer(pathname, langs, default_lang) File "/opt/conda/lib/python3.10/site-packages/fairseq2/models/nllb/tokenizer.py", line 43, in init super().init(pathname, control_symbols) File "/opt/conda/lib/python3.10/site-packages/fairseq2/data/text/sentencepiece.py", line 142, in init self.model = SentencePieceModel(pathname, control_symbols) RuntimeError: basic_filebuf::underflow error reading the file: Is a directory

my finetune bash is: torchrun \ --rdzv-backend=c10d \ --rdzv-endpoint=localhost:0 \ --nnodes=1 \ --nproc-per-node=2 \ --no-python \ m4t_finetune \ --mode SPEECH_TO_TEXT \ --train_dataset /code/path.json \ --eval_dataset /code/path1.json \ --learning_rate 1e-6 \ --warmup_steps 100 \ --max_epochs 10 \ --patience 5 \ --model_name seamlessM4T_v2_large \ --save_model_to /code/checkpoint.pt

yiyibooks commented 3 months ago

I met the same issue and solved it by downloading the checkpoint manually.