When i tried to implement this instruction:
./tools/dist_train.sh configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 1 --amp
I got this error,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
How can i debug it ?
Thanks a lot
When i tried to implement this instruction: ./tools/dist_train.sh configs/pretrain/yolo_world_v2_x_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 1 --amp I got this error, torch.distributed.elastic.multiprocessing.errors.ChildFailedError: How can i debug it ? Thanks a lot