How to use a single-machine multi-GPU to run

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

https://arxiv.org/abs/2211.06687

Creative Commons Zero v1.0 Universal

1.23k stars 121 forks source link

How to use a single-machine multi-GPU to run #122

Open CoderwayNew opened 10 months ago

CoderwayNew commented 10 months ago

The function to get the device: init_distributed_device(args), in training.main.py line 129, it seems that only a single-GPU device can be obtained. The key part of the function is defined as follows:

  if torch.cuda.is_available():  
        if args.distributed and not args.no_set_device_rank: 
            device = 'cuda:%d' % args.local_rank 
        else:  
            device = 'cuda:0'  
        torch.cuda.set_device(device)  
    else:
        device = 'cpu'

How to make the project run on a single-machine multi-GPU?

CoderwayNew commented 10 months ago

Just use the command line, then enter the project directory:

torchrun --nnode=1 --nproc_per_node=2 ./training/main.py

--nnode=1 # single-machine --npor_per_node =2 # 2 GPU