Much more training time is needed if I trained model with multi-machine&single-gpu

I tried to run distributed training on multi machine single GPU, and found it takes much more time than training on single machine single GPU, so I made the following test:

Config: configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml (Modifying Batch Size = 4) num_gpus: single machine single RTX3090 vs. two machines single RTX3090

The results show that it takes 7 days to train on single RTX3090, but 70 days for distributed training. Two machines are located in the same local area network using an Ethernet switch with CAT-6 cables. Can you give me some advice for this issue?

facebookresearch / Mask2Former

Much more training time is needed if I trained model with multi-machine&single-gpu #152