Multi-GPU - Githubissues

chenbys commented 3 years ago

Thanks for the excellent paper and code.

My run with "--num-gpus 1" is alright, while "--num-gpus 2" leads to the following error:

python tools/train_net.py --num-gpus 2 --config-file configs/COCOA_cls-AmodalSegmentation/mask_rcnn_R_50_FPN_1x_parallel_CtRef_VAR_SPRef_SPRet_FM.yaml

.... File "/workspace/Amodal-Segmentation-Based-on-Visible-Region-Segmentation-and-Shape-Prior/detectron2/modeling/roi_heads/recon_net.py", line 318, in nearest_decode distances = torch.addmm(codebook_sqr + inputs_sqr, RuntimeError: expected device cuda:0 but got device cuda:1

In addition, some APIs like "dist.all_reduce" summarizing tensors across gpus seem not to exist. Therefore, how to run this code for multi-gpu? Can we run this code in DP or DDP for multi-gpu? Anyway, I am not familiar with detectron2.

Thanks for any suggestions.

YutingXiao commented 3 years ago

I tried to run detectron2 for multi-GPU, but I failed. All of our experiments are finished on a single GPU. I think maybe you can find the solution in Issues of Detectron2.

chenbys commented 3 years ago

Thanks for the comments. I found we could directly run by python tools/train_net.py --num-gpu 2 --config-file configs/COCOA_cls-AmodalSegmentation/mask_rcnn_R_50_FPN_1x_parallel.yaml. Therefore, it may need some modifications for compatibility.

YutingXiao / Amodal-Segmentation-Based-on-Visible-Region-Segmentation-and-Shape-Prior

Multi-GPU #6