ali-vilab / UniAnimate

Code for Paper "UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation".
https://unianimate.github.io/
982 stars 52 forks source link

some good results #12

Open shaoguowen opened 3 months ago

shaoguowen commented 3 months ago

效果还是挺不错的,特别是脸部没有变形很严重,我遇到两个bug:

  1. 第一个是tools/inferences/inference_unianimate_long_entrance.py->load_video_frames函数里用for ii_index in os.listdir(pose_file_path)读取图片名,但是可能会导致不是按顺序返回,我加上sorted后解决:for ii_index in sorted(os.listdir(pose_file_path))。
  2. 显存占用过高,分析下来是没有成功使用上fp16, 修改后可以在14G显存在跑 32x512x768。修改后的代码: ` device = "cuda" dtype = torch.float16

    clip_encoder = EMBEDDER.build(cfg.embedder) clip_encoder.model.to(device, dtype=dtype) with torch.nograd(): , _, zero_y = clip_encoder(text="")

    autoencoder = AUTO_ENCODER.build(cfg.auto_encoder) autoencoder.eval() # freeze for param in autoencoder.parameters(): param.requires_grad = False autoencoder.to(device, dtype=dtype)

    if "config" in cfg.UNet: cfg.UNet["config"] = cfg cfg.UNet["zero_y"] = zero_y model = MODEL.build(cfg.UNet) state_dict = torch.load(cfg.test_model, map_location='cpu') if 'state_dict' in state_dict: state_dict = state_dict['state_dict'] if 'step' in state_dict: resume_step = state_dict['step'] else: resume_step = 0 status = model.load_state_dict(state_dict, strict=True) logging.info('Load model from {} with status {}'.format(cfg.test_model, status)) model = model.to(device, dtype=dtype) model.eval() torch.cuda.empty_cache() `

一些结果:

https://github.com/ali-vilab/UniAnimate/assets/57278682/6cff04e6-dd99-49ce-8227-9536c183b4c2

https://github.com/ali-vilab/UniAnimate/assets/57278682/4486599e-98bb-4e02-b76e-a4c6739ad0b7

欢迎关注我视频号:温少的AIGC,一起交流讨论,用AIGC搞副业~

wangxiang1230 commented 3 months ago

Hi, thanks for your contribution. We will update the code as you suggested. We also welcome further comments on improving code and/or better results.

wangxiang1230 commented 3 months ago

In addition, we usually find that better results will be obtained by changing the resolution to 1216x768 (less jitters) if you have sufficient GPU memory.

shaoguowen commented 3 months ago

In addition, we usually find that better results will be obtained by changing the resolution to 1216x768 (less jitters) if you have sufficient GPU memory.

thanks, i will take a try~