Open canqin001 opened 5 months ago
Hi Can, I have tested run_mp in my environment without problem. I doubt if there is any incompatibility issue in terms of environment? for instance, pytorch?
@canqin001 have debugged using import rpdb
. Now, I have try with 2 or 4 GPUs and everything goes well (maybe any number will be fine). You can edit the related file as follows:
lvdm-models-autoencoder.py-line-97 add with:
def encode(self, x, **kwargs): if torch.distributed.is_initialized(): x = x.to(f"cuda:{torch.distributed.get_rank()}")
lvdm-modules-encoders-conditions.py-line-214 add with:
def encode_with_transformer(self, text): if torch.distributed.is_initialized(): x = self.model.token_embedding(text.to(f"cuda:{torch.distributed.get_rank()}")) else: x = self.model.token_embedding(text)
scripts-evalation-inference.py-line-174 add with:
img = videos[:,:,0] #bchw img =img.to(model.device)
Hope this could help you.
@Chuchad Thank you for the great info, this allowed it to work on 4 A10 GPUS finally!
Do you know if it is possible to have it so that the checkpoint is loaded onto the GPUs so that when I want to run inference it is faster? for example I notice in the output everytime I run the model via run_mp.sh it sais "model checkpoint loaded" how can I have it loaded before inference call with an image?
@Chuchad Thank you for the great info, this allowed it to work on 4 A10 GPUS finally!
Do you know if it is possible to have it so that the checkpoint is loaded onto the GPUs so that when I want to run inference it is faster? for example I notice in the output everytime I run the model via run_mp.sh it sais "model checkpoint loaded" how can I have it loaded before inference call with an image?
You mean GPU loads model checkpoint in advance and the process can inference whenever receive your inference requests? You can integrate run_mp.sh pipeline with gradio_app.py. But it seems a little troublesome, haha.
Hi, @Doubiiu thank you for the impressive work! I met an error when running
sh scripts/run_mp.sh 1024
:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:7! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)
Do you have any suggestions?