facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.92k stars 1.06k forks source link

Error when fine-tuning the model: ModuleNotFoundError: No module named 'fairseq2.models.unity' #40

Open Subarasheese opened 1 year ago

Subarasheese commented 1 year ago

Hello, I am following the fine-tuning guide with the following command:

torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc-per-node=8 --no-python python finetune.py --mode SPEECH_TO_SPEECH --train_dataset ./m4t_dataset/train_manifest.json --eval_dataset ./m4t_dataset/validation_manifest.json --learning_rate 1e-6 --warmup_steps 100 --max_epochs 10 --patience 3 --model_name seamlessM4T_large --save_model_to ./m4t_dataset/checkpoint.pt

However, this happens:


Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/finetune.py", line 16, in <module>
    import trainer
  File "/home/privateserver/Coding/seamless_communication/scripts/m4t/finetune/trainer.py", line 21, in <module>
    from fairseq2.models.unity import UnitYModel
ModuleNotFoundError: No module named 'fairseq2.models.unity'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 87402) of binary: python
Traceback (most recent call last):
  File "/home/privateserver/Coding/seamless_communication/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/privateserver/Coding/seamless_communication/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
python FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 87403)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 87404)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 87405)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 4 (local_rank: 4)
  exitcode  : 1 (pid: 87406)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 5 (local_rank: 5)
  exitcode  : 1 (pid: 87407)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 87408)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 7 (local_rank: 7)
  exitcode  : 1 (pid: 87409)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-23_01:55:36
  host      : privateserver
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 87402)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Can someone tell me how to fix this? Thank you

mavlyutovr commented 1 year ago

@Subarasheese could you please pull latest changes from main branch and re-install the package?

git checkout main
git pull 
pip install -U .
mberman84 commented 1 year ago

i'm getting this error during initial installation, which looks related:

ERROR: Could not find a version that satisfies the requirement fairseq2n==0.1.0 (from fairseq2) (from versions: none) ERROR: No matching distribution found for fairseq2n==0.1.0

kauterry commented 1 year ago

Are you using a Linux system as opposed to macOS, Windows? fairseq2 currently has support only for Linux.

Subarasheese commented 1 year ago

@Subarasheese could you please pull latest changes from main branch and re-install the package?

git checkout main
git pull 
pip install -U .

I just did that, same problem unfortunately. (ModuleNotFoundError: No module named 'fairseq2.models.unity')

cndn commented 1 year ago

Hey @Subarasheese Probably you need to purge the cache - also it's on us to update the lib version. Could you try

pip install --no-cache-dir .

or --force-reinstall option?