TMElyralab / MuseV

MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Other
2.49k stars 267 forks source link

MuseV多块显卡执行图片生成依然报OOM #141

Open youtianhong opened 5 months ago

youtianhong commented 5 months ago

问题背景:

启动的时候,大概占用了12G显存,然后我在gradio界面上开始执行图生视频,报OOM

return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 14.57 GiB total capacity; 13.61 GiB already allocated; 118.75 MiB free; 14.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我的问题是,我生产是4块显卡,每块16G,共64G显存,这个museV如何指定自动分摊占用的显存到多块显卡上?我现在启动占了12G,然后在gradio上一跑就OOM了,(我单独指定一块显卡没用,发现一块显卡带不起来,步骤一和步骤二都在一块显卡上,被自己撑死了)

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

youtianhong commented 5 months ago

我花了九牛二虎之力,才把这个启动起来(一堆错误,还要改源码),然后真正图生视频却OOM了

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)

Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |

报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

youtianhong commented 5 months ago

@xzqjack 大佬,我现在按你说的,只改了 video2video的 改成 device="cuda:2",然后text2video的删了(默认0),还是报错啊(这次还是报OOM) 难道多块显卡也无解么? 求指导 +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================|

| 0 N/A N/A 29536 C python 13614MiB | | 2 N/A N/A 29536 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | +-----------------------------------------------------------------------------------------+

xzqjack commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)

Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |

报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后) Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | 报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

大佬,这个问题是不是你们可以fix一下,发个版本啊,感谢啊

youtianhong commented 5 months ago

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后) Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB | 报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

大佬,这个问题是不是你们可以fix一下,发个版本啊,感谢啊

hjj-lmx commented 3 months ago

我花了九牛二虎之力,才把这个启动起来(一个错误,还要改),然后真正的图生视频却OOM了

请问怎么部署成功的,我本地怎么样都跑不起来,安装依赖全是报错