PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.61k stars 1.03k forks source link

Some question about resume training causalvae #229

Open ZhikangNiu opened 7 months ago

ZhikangNiu commented 7 months ago

When I want to resume train causalvae in our own dataset with following script, it always report the following bug

_pickle.UnpicklingError: invalid load key, '\xbb'.

here are my script

python opensora/train/train_causalvae.py \
    --exp_name "ucf" \
    --batch_size 1 \
    --precision bf16 \
    --max_steps 40000 \
    --save_steps 100 \
    --output_dir results/causalvae_ \
    --video_path /home/v-zhikangniu/Open-Sora-Plan/data/MSRVTT \
    --video_num_frames 17 \
    --resolution 256 \
    --sample_rate 1 \
    --n_nodes 1 \
    --devices 1 \
    --num_workers 8 \
    --model_config scripts/causalvae/release.json \
    --resume_from_checkpoint /home/v-zhikangniu/Open-Sora-Plan/checkpoint_v1/17x256x256/diffusion_pytorch_model.safetensors

I'm sure the code is latest

qqingzheng commented 7 months ago

The resume_from_checkpoint parameter should only be filled with the path to the checkpoint file XXXX.ckpt output by PyTorch Lightning. However, what you might actually want is the load_from_checkpoint parameter, which requires the directory path of the config.json file and the model file (which can be in either the HF format or the PL format).