THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.27k stars 781 forks source link

Finetune stuck at vae encode #413

Open foreverpiano opened 1 week ago

foreverpiano commented 1 week ago

System Info / 系統信息

11.8 pytorch 2.5.0

Information / 问题信息

Reproduction / 复现过程

def encode_video(video):
    video = video.to(accelerator.device, dtype=vae.dtype).unsqueeze(0)
    video = video.permute(0, 2, 1, 3, 4)  # [B, C, F, H, W]
    latent_dist = vae.encode(video).latent_dist
    return latent_dist
train_dataset.instance_videos = [encode_video(video) for video in train_dataset.instance_videos]

this fuction stuck https://github.com/THUDM/CogVideo/blob/main/finetune/train_cogvideox_lora.py

Expected behavior / 期待表现

workr normal

foreverpiano commented 1 week ago

@zRzRzRzRzRzRzR

foreverpiano commented 1 week ago

不知道是不是数据格式的问题,请问可以提供下video.shape吗,我数据是从disney按照官方的要求下载的