Open robotsp opened 1 year ago
@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.
Thanks @ibtiRaj
@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.
@ibtiRaj But it seems a new error when you downgraded stopes to the old version, right? And @kauterry mentioned you (https://github.com/facebookresearch/stopes/issues/24) to install the new version of stopes to solve the error. I confused if there is an end to end solution :)
Best,
@robotsp I'm confused too, I don't know what to do.
@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?
@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?
8 GPUs, 48 CPUs, 480GB Mem @ibtiRaj . btw, would you please provide your running scripts and config files that altered? Thanks!
hi @robotsp, to fine tune NLLB model I use this command:
srun python /home/admin/khadija/fairseq/examples/nllb/modeling/train/train_script.py cfg=nllb200_dense3.3B_finetune_on_fbseed cfg/dataset=bilingual cfg.dataset.lang_pairs=ary_Arab-eng_Latn cfg.fairseq_root=/home/admin/khadija/fairseq cfg.output_dir=/home/admin/khadija/storagenas/fine_tune_nllb_output/model_fine_tuned cfg.dropout=0.1 cfg.warmup=10 cfg.finetune_from_model=/home/admin/khadija/storagenas/projects/NLLB_modeles/checkpoint.pt
and here are my configuration files:
Is that what you meant?
Hello, I have the same problem as you.
I found that the problem might be on the stopes
.
Like error report, abstract method requirements are not implemented in TrainModule.
Perhaps because the nllb
and stopes
versions do not correspond, there may be no requirements in the previous versions, so I deleted the requirements
method in stopes.stopes.core.stopes_module
. The program can run normally. I hope it will help you.
Hello, I have implemented the abstract method by my own in fairseq/examples/nllb/modeling/train/train_script.py (pretty simple).
It worked for me. However, when I try to load the model, I get errors. I am also trying to fine tune it, it seems that it is trying to initialise the model with my vocabulary instead of doing it with the vocabulary it has been trained on. Anyone have a solution to this problem?
š Bug
Try to finetune NLLB but got an error
Can't instantiate abstract class TrainModule with abstract methods requirements
CMD
Complete Error
Environment
pip
, source): pip+sourceAdditional context