How to use Multi-GPU for training and inferencing?

sakemin commented 1 year ago

Hello,

I have 8 * A40(48G) GPUs, so I wanna use them all for training and inferencing.

But I can't find the Multi-GPU things like DataParallel or DistributedDataParallel in train.py code, maybe they are wrapped with Dora.

And for the inferencing, I used the code from MUSICGEN.md below, by tweaking it. But seems like the MusicGen model is not a child of nn.Module, and it has lm inside it, so if I wrap model as model = nn.DataParallel(model), it doesn't seem like it is using multi-gpus.

Should I wrap model.lm as nn.DataParallel(model.lm)? I wonder if the code still works since the codes are using lm.generate(), maybe it should be modified as lm.module.generate().

Is there any pre-existing multi-gpu code in the repo?

Thanks.

Best, Sake

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

mepc36 commented 1 year ago

Any help here? Trying to do inference with multiple GPUs...

Maggione commented 1 year ago

And "dora run -d" not work I have 8 GPUs and my script is as following:

dora run -d solver=musicgen/musicgen_base_32khz model/lm/model_scale=small continue_from=//pretrained/facebook/musicgen-small conditioner=text2music

however, it always can only find one workers:

[[36m08-17 10:28:15[0m][[34mroot[0m][[32mINFO[0m] - Getting pretrained compression model from HF facebook/encodec_32khz[0m
[[36m08-17 10:28:13[0m][[34mdora.distrib[0m][[32mINFO[0m] - world_size is 1, skipping init.[0m
[[36m08-17 10:28:13[0m][[34mflashy.solver[0m][[32mINFO[0m] - Instantiating solver MusicGenSolver for XP 4284c302[0m

adefossez commented 1 year ago

So distributed inference is not supported.

Distributed training should work out of the box with dora run -d. Can you check in python:

import torch
print(torch.cuda.device_count())

adefossez commented 1 year ago

For multi node training, it is supported with SLURM, but without SLURM it is a bit more complex...

yawnzh commented 1 year ago

same issue here when trying to train musicgen with multiple GPUs with dora run -d solver=musicgen/musicgen_base and got [08-30 12:17:47][dora.distrib][INFO] - world_size is 1, skipping init. but actually I have 2 gpus

>>> import torch
>>> print(torch.cuda.device_count())
2

ajayarora1235 commented 11 months ago

@yawnzh did you ever figure out a workaround for this? want to train musicgen on cloud GPUs that don't have SLURM set up.

X-Drunker commented 11 months ago

And "dora run -d" not work I have 8 GPUs and my script is as following:
dora run -d solver=musicgen/musicgen_base_32khz model/lm/model_scale=small continue_from=//pretrained/facebook/musicgen-small conditioner=text2music
however, it always can only find one workers:
[�[36m08-17 10:28:15�[0m][�[34mroot�[0m][�[32mINFO�[0m] - Getting pretrained compression model from HF facebook/encodec_32khz�[0m
[�[36m08-17 10:28:13�[0m][�[34mdora.distrib�[0m][�[32mINFO�[0m] - world_size is 1, skipping init.�[0m
[�[36m08-17 10:28:13�[0m][�[34mflashy.solver�[0m][�[32mINFO�[0m] - Instantiating solver MusicGenSolver for XP 4284c302�[0m
same issue here when trying to train musicgen with multiple GPUs with dora run -d solver=musicgen/musicgen_base and got [08-30 12:17:47][dora.distrib][INFO] - world_size is 1, skipping init. but actually I have 2 gpus
>>> import torch
>>> print(torch.cuda.device_count())
2

almost same. did you solve this problem? @Maggione @yawnzh

a624090359 commented 11 months ago

Adding 'CUDA_VISIBLE_DEVICES' before "dora" may works.

facebookresearch / audiocraft

How to use Multi-GPU for training and inferencing? #211