Closed developeranalyser closed 5 months ago
I'm also hitting this - how did you solve this?
easy
Hi, can anyone tell me how to overcome this error? it cause due to the units in the fleur dataset is Null, ie. units are not extracted from the audio. do we have to manually extract it or something?
if srcLang and tgtLang be same Fine Tune now work why ?
2024-04-16 13:24:40,044 INFO -- seamless_communication.cli.m4t.finetune.trainer.11840: Start finetuning 2024-04-16 13:24:40,044 INFO -- seamless_communication.cli.m4t.finetune.trainer.11840: Run evaluation 0% 0/78 [00:09<?, ?it/s] Traceback (most recent call last): File "/content/my_seamless_communication/src/finetune.py", line 200, in
main()
File "/content/my_seamless_communication/src/finetune.py", line 196, in main
finetune.run()
File "/content/my_seamless_communication/src/seamless_communication/cli/m4t/finetune/trainer.py", line 386, in run
self._eval_model()
File "/content/my_seamless_communication/src/seamless_communication/cli/m4t/finetune/trainer.py", line 331, in _eval_model
loss = self.calc_loss(batch, self.model(batch))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(inputs, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/content/my_seamless_communication/src/seamless_communication/cli/m4t/finetune/trainer.py", line 123, in forward
assert batch.text_to_units.prev_output_tokens is not None
AssertionError
[2024-04-16 13:24:53,644] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 11840) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(