Open Taldhi opened 7 months ago
This is due to a version conflict in the installation environment. Please follow the latest requirements.
This is due to a version conflict in the installation environment. Please follow the latest requirements.
I set up a fresh new enviounment using your updated requirements.txt , I am using only one gpu Quadro GV100 [32 gb ] I am using your scripts/test_condition/train_imageae.sh for training but still facing the same issue as follows
I used a small protion of the mixkit dataset and adjusted the json file accordingly
Steps: 0%| | 0/1000000 [00:00<?, ?it/s]Traceback (most recent call last):
File "opensora/train/train_t2v.py", line 807, in
@LinB203 please see this error again and give some suggestions accordingly . It would be really helpful for me to proceed . Thank You for your time.
We have encountered the following errors while attempting to execute the train_vidae.sh script.
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
sys.exit(main())
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1042, in launch_command
deepspeed_launcher(args)
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/accelerate/commands/launch.py", line 754, in deepspeed_launcher
distrib_run.run(args)
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/pritam/anaconda3/envs/opensora/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
[2024-04-10 10:23:00,020] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1115) of binary: /home/pritam/anaconda3/envs/opensora/bin/python Traceback (most recent call last): File "/home/pritam/anaconda3/envs/opensora/bin/accelerate", line 8, in
AttributeError: 'FieldInfo' object has no attribute 'required'