ShiqiYu / OpenGait

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.
664 stars 154 forks source link

torch.distributed.elastic.multiprocessing.errors.ChildFailedError: #202

Closed ReinerBRO closed 1 week ago

ReinerBRO commented 2 months ago

I encountered two errors below when i runned the command: CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 opengait/main.py --cfgs ./configs/gaitset/gaitset.yaml --phase train --log_to_file

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1639) of binary: /root/miniconda3/bin/python torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

GPU: 1 Nvidia3090 PyTorch 1.10.0+cu113 Python 3.8(ubuntu20.04) Cuda 11.3 torchvision 0.11.1+cu113

Can anyone help me out with this? Many thanks!

ReinerBRO commented 2 months ago

I solved this problem. Its due to my wrong way of data preatreatment.

github-actions[bot] commented 2 weeks ago

Stale issue message