facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.17k stars 6.37k forks source link

Try to finetune NLLB but got an error: "Can't instantiate abstract class TrainModule with abstract methods requirements" #4989

Open robotsp opened 1 year ago

robotsp commented 1 year ago

šŸ› Bug

Try to finetune NLLB but got an error

Can't instantiate abstract class TrainModule with abstract methods requirements

CMD

python /fairseq-nllb/examples/nllb/modeling/train/train_script.py \
    cfg=bilingual \
    cfg/dataset=$DATA_CONFIG \
    cfg.dataset.lang_pairs="$SRC-$TGT" \
    cfg.fairseq_root=$FAIRSEQ_ROOT \
    cfg.output_dir=$OUTPUT_DIR \
    cfg.dropout=$DROP \
    cfg.warmup=10 \
    cfg.finetune_from_model=$MODEL_FOLDER/checkpoint.pt

Complete Error

The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="conf", config_name="base_config")
/usr/local/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Error executing job with overrides: ['cfg=bilingual', 'cfg/dataset=fbseed_bilingual.yaml', 'cfg.fairseq_root=/fairseq-nllb', 'cfg.output_dir=/output_nllb', 'cfg.dropout=0.1', 'cfg.warmup=10', 'cfg.finetune_from_model=/output_nllb/nllb_model/checkpoint.pt']
Traceback (most recent call last):
  File "/fairseq-nllb/examples/nllb/modeling/train/train_script.py", line 289, in main
    train_module = TrainModule(config)
TypeError: Can't instantiate abstract class TrainModule with abstract methods requirements

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment

Additional context

ibtiRaj commented 1 year ago

@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.

robotsp commented 1 year ago

Thanks @ibtiRaj

robotsp commented 1 year ago

@robotsp hey, I got this error when I reinstalled Stopes with the new version. I think Fairseq is not compatible with the new version of Stopes. I solved this problem by using the old version.

@ibtiRaj But it seems a new error when you downgraded stopes to the old version, right? And @kauterry mentioned you (https://github.com/facebookresearch/stopes/issues/24) to install the new version of stopes to solve the error. I confused if there is an end to end solution :)

Best,

ibtiRaj commented 1 year ago

@robotsp I'm confused too, I don't know what to do.

ibtiRaj commented 1 year ago

@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?

robotsp commented 1 year ago

@robotsp Can you tell me what your system configuration is, i.e. number of GPUs, GPU memory and system memory (RAM)?

8 GPUs, 48 CPUs, 480GB Mem @ibtiRaj . btw, would you please provide your running scripts and config files that altered? Thanks!

ibtiRaj commented 1 year ago

hi @robotsp, to fine tune NLLB model I use this command:

srun python /home/admin/khadija/fairseq/examples/nllb/modeling/train/train_script.py cfg=nllb200_dense3.3B_finetune_on_fbseed cfg/dataset=bilingual cfg.dataset.lang_pairs=ary_Arab-eng_Latn cfg.fairseq_root=/home/admin/khadija/fairseq cfg.output_dir=/home/admin/khadija/storagenas/fine_tune_nllb_output/model_fine_tuned cfg.dropout=0.1 cfg.warmup=10 cfg.finetune_from_model=/home/admin/khadija/storagenas/projects/NLLB_modeles/checkpoint.pt

and here are my configuration files:

image

image

Is that what you meant?

The-Next commented 1 year ago

Hello, I have the same problem as you. I found that the problem might be on the stopes. Like error report, abstract method requirements are not implemented in TrainModule. Perhaps because the nllb and stopes versions do not correspond, there may be no requirements in the previous versions, so I deleted the requirements method in stopes.stopes.core.stopes_module. The program can run normally. I hope it will help you.

martinbombin commented 1 year ago

Hello, I have implemented the abstract method by my own in fairseq/examples/nllb/modeling/train/train_script.py (pretty simple).

Captura de pantalla de 2023-03-07 11-34-23

It worked for me. However, when I try to load the model, I get errors. I am also trying to fine tune it, it seems that it is trying to initialise the model with my vocabulary instead of doing it with the vocabulary it has been trained on. Anyone have a solution to this problem?