Open SlimeVRX opened 4 months ago
yes,using a single 4090 gpu is also like this,I replace the 72 to 32,even worse than old ckpt.maybe I did something wrong.
@SlimeVRX @ak01user The sample video we provide is a 35-second long video. You can try testing with a shorter video to obtain the expected results within an acceptable time.
You can verify if the model is utilizing the GPU effectively. The expected inference speed of a 72 frame denosing step is less than 5 seconds per iteration on an A100 GPU. However, your reported speed of 251 seconds per iteration appears unusually slow for a 3090 GPU.
If the VRAM requirement is too high, hope this will help: https://github.com/Tencent/MimicMotion/issues/21#issuecomment-2213978415
@gujiaxi sorry,I just tested a 15 sec long video,It will spend 3h30m.It is too long.
I tried the old model (MimicMotion_1) again and identified that the cause is not the model but the number of frames,
which is 72 frames (RTX 3090 24GB)
Done Pre-process data!
2%|████▌ | 1/50 [03:47<3:05:29, 227.14s/it]
2%|████▌ | 1/50 [04:26<3:37:50, 266.75s/it]
Are you sure you're not spilling into system RAM? What resolution & number of frames are you doing?
Yes, 11.2/15.9 share GPU memory this may be the cause of slowness
ckpt_path: models/MimicMotion_1-1.pth num_frames: 72 resolution: 576
Updated the last commit!
To prevent this from happening, go to nvidia control panel, under "manage 3d settings" and under "CUDA - System fallback policy" change that setting to "prefer no sysmem fallback" so now you'll just get out of memory errors vs waiting 5 hours.
This issue is indeed attributed to the system's RAM or out-of-memory (OOM) errors. I am actively working on minimizing the VRAM requirements. I anticipate that the 72-frame model will perform efficiently on 4090.
after sync with the latest PR #32 of yesterday night, I also got a long time waiting and large vram size on 4090, ran using python inference.py --inference_config configs/test.yaml
after sync with the latest PR #32 of yesterday night, I also got a long time waiting and large vram size on 4090
@Minamiyama This seems strange. One difference is that I tested on ubuntu, but I don't have a Windows machine with a 4090 GPU for testing. Could you try the following setting?
To prevent this from happening, go to nvidia control panel, under "manage 3d settings" and under "CUDA - System fallback policy" change that setting to "prefer no sysmem fallback" so now you'll just get out of memory errors vs waiting 5 hours.
Or does anyone else using windows/ubuntu can or cannot run the 72 frame model with 16G VRAM?
after sync with the latest PR #32 of yesterday night, I also got a long time waiting and large vram size on 4090
@Minamiyama This seems strange. One difference is that I tested on ubuntu, but I don't have a Windows machine with a 4090 GPU for testing. Could you try the following setting?
To prevent this from happening, go to nvidia control panel, under "manage 3d settings" and under "CUDA - System fallback policy" change that setting to "prefer no sysmem fallback" so now you'll just get out of memory errors vs waiting 5 hours.
Or does anyone else using windows/ubuntu can or cannot run the 72 frame model with 16G VRAM?
yes, as your mention, I put it to docker then, it finally ran totally fine, just using 17 minutes 😄,thx very much
Yes,I can run success in 72 frames with new model,but,it has the same results as the old model and has not improved.
| 40/1325 [04:06<2:12:39, 6.19s/it]
It will spend 2h16m on my 3060. It is too long
@zyayoung 1.1的效果比1要更差
@akk-123 You need to set num_frames to 72 for the 1.1 model. The only difference between 1 and 1.1 is the number of frames per segemnt used during training. If you determine that version 1 performs better for your needs, you have the option to use it instead.
I am inferring a video consisting of 72 frames with default parameters, but the processing time has increased significantly!
It takes 3h17p for 72 frames (RTX 3090 24GB)