MuseV多块显卡执行图片生成依然报OOM

youtianhong commented 5 months ago

问题背景：

启动的时候，大概占用了12G显存，然后我在gradio界面上开始执行图生视频，报OOM

return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 14.57 GiB total capacity; 13.61 GiB already allocated; 118.75 MiB free; 14.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我的问题是，我生产是4块显卡，每块16G，共64G显存，这个museV如何指定自动分摊占用的显存到多块显卡上？我现在启动占了12G,然后在gradio上一跑就OOM了，（我单独指定一块显卡没用，发现一块显卡带不起来，步骤一和步骤二都在一块显卡上，被自己撑死了）

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

youtianhong commented 5 months ago

我花了九牛二虎之力，才把这个启动起来（一堆错误，还要改源码），然后真正图生视频却OOM了

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda"，在torch里默认为都是用"cuda:0"，所以你可以试试把video2video的改成 device="cuda:1"

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda"，在torch里默认为都是用"cuda:0"，所以你可以试试把video2video的改成 device="cuda:1"

感谢大佬回复，貌似不行啊，我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的，下面是显卡的内存使用，占了3块（启动后）

报错问题如下：（必须在一块卡上么？哪设错了） File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

youtianhong commented 5 months ago

@xzqjack 大佬，我现在按你说的，只改了 video2video的改成 device="cuda:2"，然后text2video的删了（默认0），还是报错啊（这次还是报OOM） 难道多块显卡也无解么? 求指导 +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================|

| 0 N/A N/A 29536 C python 13614MiB | | 2 N/A N/A 29536 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | +-----------------------------------------------------------------------------------------+

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda"，在torch里默认为都是用"cuda:0"，所以你可以试试把video2video的改成 device="cuda:1"

感谢大佬回复，貌似不行啊，我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的，下面是显卡的内存使用，占了3块（启动后）

Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |

报错问题如下：（必须在一块卡上么？哪设错了） File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的，只是脚本哪里没有完全适配device的切换

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda"，在torch里默认为都是用"cuda:0"，所以你可以试试把video2video的改成 device="cuda:1"

感谢大佬回复，貌似不行啊，我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的，下面是显卡的内存使用，占了3块（启动后） Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | 报错问题如下：（必须在一块卡上么？哪设错了） File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的，只是脚本哪里没有完全适配device的切换

大佬，这个问题是不是你们可以fix一下，发个版本啊，感谢啊

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上，可以优化其中一个（如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么？你这个优化啥意思？我是想启动的时候占用显卡1，gradio跑图生视频的时候占用显卡2，你这个咋设置？

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda"，在torch里默认为都是用"cuda:0"，所以你可以试试把video2video的改成 device="cuda:1"

感谢大佬回复，貌似不行啊，我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的，下面是显卡的内存使用，占了3块（启动后） Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | 报错问题如下：（必须在一块卡上么？哪设错了） File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的，只是脚本哪里没有完全适配device的切换

大佬，这个问题是不是你们可以fix一下，发个版本啊，感谢啊

hjj-lmx commented 3 months ago

我花了九牛二虎之力，才把这个启动起来（一个错误，还要改），然后真正的图生视频却OOM了

请问怎么部署成功的，我本地怎么样都跑不起来，安装依赖全是报错

TMElyralab / MuseV

MuseV多块显卡执行图片生成依然报OOM #141