Closed wing158 closed 1 month ago
num_channels_latents = self.denoising_unet.in_channels 35%|█████████████████████████████ | 7/20 [1:43:35<3:13:43, 894.13s/it] 24G 3090的速度,能否优化下?像mouseTalk
I am also in this situation.
This is my inference speed on V100
估计是跟显存24G使用完及GPU占100%有关,这两个资源全占用会使程序会变得特别慢
Hello, guys! Thanks for your support. Here I would like to give a suggestion that may help you to solve the problem of low inference speed. You could try to reduce the running resolution from 768x768x48 to 512x512x48. It takes 16GB VRAM to run on 512x512x48 It takes 28GB VRAM to run on 768x768x48 ( I don't know why it could run on 768x768x48 using a 3090 with 24GB. It may use a space-time tradeoff which leads to low speed.)
Besides, we also report our running speed: It takes 5 minutes to generate a 10-second video using 512x512x48 resolution on V100. It takes 16 minutes to generate a 10-second video using 768x768x48 resolution on V100. It takes 1 minute to generate a 10-second video using 512x512x48 resolution on H800. It takes 3 minutes to generate a 10-second video using 768x768x48 resolution on H800.
I was getting 360s/it on a 4090 with default size.
Are there any parameters that could be changed to optimize VRAM a bit so 768x768 fits on 24GB?
512 param ok
推理config.json及模型提示放到 MusePose/pretrained_weights/sd-image-variations-diffusers/unet unet,移动unet下后,卡住长时间没有进度
root@153a7e76ceb5:~/MusePose# python test_stage_2.py --config ./configs/test_stage_2.yaml Width: 768 Height: 768 Length: 300 Slice: 48 Overlap: 4 Classifier free guidance: 3.5 DDIM sampling steps : 20 skip 1 Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias'] /usr/local/lib/python3.10/dist-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() handle=== ./assets/images/ref.png ./assets/poses/align/img_ref_video_dance.mp4 pose video has 288 frames, with 24 fps processing length: 144 fps 12 /root/MusePose/musepose/pipelines/pipeline_pose2vid_long.py:406: FutureWarning: Accessing config attribute
in_channels
directly via 'UNet3DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet3DConditionModel's config object instead, e.g. 'unet.config.in_channels'. num_channels_latents = self.denoising_unet.in_channels 0%| | 0/20 [00:00<?, ?it/s]