Does this implementation support non-distributed training?

I found if I didn't use distributed training, i.e. set the --multiprocessing-distributed=False and use single GPU, there seems to be no problems in main_moco.py with

   torch.cuda.set_device(args.gpu)
   model = model.cuda(args.gpu)

However, this error occurred when training started

AssertionError: Default process group is not initialized

This error can be traced back to

File "~/moco-v3/moco/builder.py", line 68, in contrastive_loss k = concat_all_gather(k)

and

File "~/moco-v3/moco/builder.py", line 178, in concat_allgather for in range(torch.distributed.get_world_size())]

This error is caused by computation of contrastive_loss, which still relies on distributed training. So I wonder if the non-distributed training is not supported even if we set multiprocessing-distributed=False.

facebookresearch / moco-v3

Does this implementation support non-distributed training? #15