Inference time - Githubissues

puckikk1202 commented 8 months ago

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?

ShenhaoZhu commented 8 months ago

The inference time and the GPU memory usage have significantly exceeded expectations. You could try terminating unrelated processes and then give it another try.

G-force78 commented 8 months ago

Thats odd, with a google A100 for motion-06 its peaking at 12.2GB

100% 20/20 [01:21<00:00, 4.05s/it] 100% 116/116 [00:04<00:00, 27.11it/s]

chengzeyi commented 8 months ago

Thats odd, with a google A100 for motion-06 its peaking at 12.2GB

100% 20/20 [01:21<00:00, 4.05s/it] 100% 116/116 [00:04<00:00, 27.11it/s]

How did you get that? With an RTX 4090 I get much more VRAM usage than that number. Ok, that may be related a BUG in WSL2. But how did you achieve such a speed?

G-force78 commented 8 months ago

You need a large system RAM too, I've just tried using the T4 on colab free tier but the system RAM maxed out at 12gb loading the motion module, maybe that can be sent to VRAM instead if you have high VRAM?

Here is my config file using motion-06

num_inference_steps: 20 guidance_scale: 6 enable_zero_snr: true weight_dtype: "fp16"

guidance_types:

'depth'
'normal'
'semantic_map'
'dwpose'

noise_scheduler_kwargs: num_train_timesteps: 1000 beta_start: 0.00085 beta_end: 0.012 beta_schedule: "linear" steps_offset: 1 clip_sample: false

unet_additional_kwargs: use_inflated_groupnorm: true unet_use_cross_frame_attention: false unet_use_temporal_attention: false use_motion_module: true motion_module_resolutions:

1
2
4
8 motion_module_mid_block: true motion_module_decoder_only: false motion_module_type: Vanilla motion_module_kwargs: num_attention_heads: 8 num_transformer_block: 1 attention_block_types:
- Temporal_Self
- Temporal_Self temporal_position_encoding: true temporal_position_encoding_max_len: 32 temporal_attention_dim_div: 1

guidance_encoder_kwargs: guidance_embedding_channels: 320 guidance_input_channels: 3 block_out_channels: [16, 32, 96, 256]

enable_xformers_memory_efficient_attention: true

fudan-generative-vision / champ

Inference time #29