Possible error of parallel training

I would like to ask if the code of parallel training is correct. When I used 8GPU machine to run the code directly, I found that all GPUs were running, but it took a very long time. When I modified the code to train on 2 machines using DDP, the time was drastically reduced. I wonder if the original parallel code is wrong (dataset did not do parallel operation)? I use torch1.13 instead of 1.7, and I wonder if previous versions of torch automatically parallel.

MRzzm / DINet

Possible error of parallel training #93