fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.13k stars 511 forks source link

The project stack at ' torch.distributed.init_process_group' #204

Open RedBlack888 opened 1 year ago

RedBlack888 commented 1 year ago

I try to run the command for training Deformable DETR on one node with 8 GPUs is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh

It works. However, when I start again. The project stack at the follow line:

    print('| distributed init (rank {}): {}'.format(
        args.rank, args.dist_url), flush=True)
    torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
                                         world_size=args.world_size, rank=args.rank)

I have tried to kill -9 PID(netstat -tulpn | grep 2950), however, It did not work.

image

Could you help me to locate the bug ?

Best.

mitsuha0210 commented 5 months ago

Hello, have you solve this problem? I have met the same problem. @RedBlack888