Doubiiu / DynamiCrafter

[ECCV 2024] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Apache License 2.0
2.09k stars 165 forks source link

Error of running inference on multiple GPUs #21

Open canqin001 opened 5 months ago

canqin001 commented 5 months ago

Hi, @Doubiiu thank you for the impressive work! I met an error when running sh scripts/run_mp.sh 1024:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:7! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

Do you have any suggestions?

Doubiiu commented 5 months ago

Hi Can, I have tested run_mp in my environment without problem. I doubt if there is any incompatibility issue in terms of environment? for instance, pytorch?

Chuchad commented 3 months ago

@canqin001 have debugged using import rpdb. Now, I have try with 2 or 4 GPUs and everything goes well (maybe any number will be fine). You can edit the related file as follows:

lvdm-models-autoencoder.py-line-97 add with:

def encode(self, x, **kwargs):
if torch.distributed.is_initialized(): 
x = x.to(f"cuda:{torch.distributed.get_rank()}")

lvdm-modules-encoders-conditions.py-line-214 add with:

def encode_with_transformer(self, text):
if torch.distributed.is_initialized():  
x = self.model.token_embedding(text.to(f"cuda:{torch.distributed.get_rank()}"))
else:  
x = self.model.token_embedding(text)

scripts-evalation-inference.py-line-174 add with:

img = videos[:,:,0] #bchw
img =img.to(model.device)

Hope this could help you.

jryebread commented 2 months ago

@Chuchad Thank you for the great info, this allowed it to work on 4 A10 GPUS finally!

Do you know if it is possible to have it so that the checkpoint is loaded onto the GPUs so that when I want to run inference it is faster? for example I notice in the output everytime I run the model via run_mp.sh it sais "model checkpoint loaded" how can I have it loaded before inference call with an image? image

Chuchad commented 2 months ago

@Chuchad Thank you for the great info, this allowed it to work on 4 A10 GPUS finally!

Do you know if it is possible to have it so that the checkpoint is loaded onto the GPUs so that when I want to run inference it is faster? for example I notice in the output everytime I run the model via run_mp.sh it sais "model checkpoint loaded" how can I have it loaded before inference call with an image? image

You mean GPU loads model checkpoint in advance and the process can inference whenever receive your inference requests? You can integrate run_mp.sh pipeline with gradio_app.py. But it seems a little troublesome, haha.