Open TonyEiven opened 2 months ago
GPU: NVIDIA H20 音频长度: 1分30秒 音频格式: wav 图片格式: png 图片大小: 208K, 525x526, 25 fps, 25 tbr, 25 tbn
分支: main 配置:
pretrained_base_model_path: "./pretrained_weights/sd-image-variations-diffusers/" pretrained_vae_path: "./pretrained_weights/sd-vae-ft-mse/" audio_model_path: "./pretrained_weights/audio_processor/whisper_tiny.pt"
denoising_unet_path: "./pretrained_weights/denoising_unet_acc.pth" reference_unet_path: "./pretrained_weights/reference_unet.pth" face_locator_path: "./pretrained_weights/face_locator.pth" motion_module_path: "./pretrained_weights/motion_module_acc.pth"
inference_config: "./configs/inference/inference_v2.yaml" weight_dtype: 'fp16'
test_cases: "./assets/test_imgs/test.png":
unet_additional_kwargs: use_inflated_groupnorm: true unet_use_cross_frame_attention: false unet_use_temporal_attention: false use_motion_module: true cross_attention_dim: 384 motion_module_resolutions:
noise_scheduler_kwargs: beta_start: 0.00085 beta_end: 0.012 beta_schedule: "linear" clip_sample: false steps_offset: 1
prediction_type: "v_prediction" rescale_betas_zero_snr: True timestep_spacing: "trailing"
sampler: DDIM
I believe you are using the accelerated model for inference. In infer_audio2vid_acc.py
parser.add_argument("-L", type=int, default=1200)
try putting a very large number for 'L' like 2100 and check.
GPU: NVIDIA H20 音频长度: 1分30秒 音频格式: wav 图片格式: png 图片大小: 208K, 525x526, 25 fps, 25 tbr, 25 tbn
分支: main 配置:
configs/prompts/animation_acc.yaml
dependency models
pretrained_base_model_path: "./pretrained_weights/sd-image-variations-diffusers/" pretrained_vae_path: "./pretrained_weights/sd-vae-ft-mse/" audio_model_path: "./pretrained_weights/audio_processor/whisper_tiny.pt"
echo mimic checkpoint
denoising_unet_path: "./pretrained_weights/denoising_unet_acc.pth" reference_unet_path: "./pretrained_weights/reference_unet.pth" face_locator_path: "./pretrained_weights/face_locator.pth" motion_module_path: "./pretrained_weights/motion_module_acc.pth"
deonise model configs
inference_config: "./configs/inference/inference_v2.yaml" weight_dtype: 'fp16'
test cases
test_cases: "./assets/test_imgs/test.png":
configs/inference/inference_v2.yaml
unet_additional_kwargs: use_inflated_groupnorm: true unet_use_cross_frame_attention: false unet_use_temporal_attention: false use_motion_module: true cross_attention_dim: 384 motion_module_resolutions:
noise_scheduler_kwargs: beta_start: 0.00085 beta_end: 0.012 beta_schedule: "linear" clip_sample: false steps_offset: 1
Zero-SNR params
prediction_type: "v_prediction" rescale_betas_zero_snr: True timestep_spacing: "trailing"
sampler: DDIM