PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Apache License 2.0
10.89k stars 973 forks source link

Sample Error #77

Open luo3300612 opened 4 months ago

luo3300612 commented 4 months ago

Two days ago, I train a Dit-XL with the following command:

torchrun --nproc_per_node=8 src/train.py \
  --model DiT-XL/122 \
  --vae ucf101_stride4x4x4 \
  --data-path ./UCF-101 --num-classes 101 \
  --sample-rate 2 --num-frames 8 --max-image-size 128 --clip-grad-norm 1 \
  --epochs 14000 --global-batch-size 64 --lr 1e-4 \
  --ckpt-every 4000 --log-every 1000 \
  --results-dir ./exp1

Today, I try to sample a video through:

python opensora/sample/sample.py \
  --model DiT-XL/122 --ae ucf101_stride4x4x4 \
  --ckpt ./exp1/000-DiT-XL-122/checkpoints/0012000.pt --extras 1 \
  --fps 10 --num-frames 16 --image-size 256

However, I met

    model.load_state_dict(state_dict)
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DiT:
        Unexpected key(s) in state_dict: "y_embedder.embedding_table.weight".

Thank you for taking the time to look into this issue. I look forward to your response.

LinB203 commented 4 months ago

Fixed that. Use --extras 1 to advoid it.

https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/models/diffusion/dit/dit.py#L239

junwenxiong commented 4 months ago

It seems to ignore the attention_mask used in DiT forward function.

  File "/mnt/workspace/Text-to-Video/Open-Sora-Plan/opensora/models/diffusion/diffusion/respace.py", line 130, in __call__
    return self.model(x, new_ts, **kwargs)
TypeError: forward() missing 1 required positional argument: 'attention_mask'

https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/f1542802351a3df5c9c66732db2d265d9e49c525/opensora/sample/sample.py#L80

LinB203 commented 4 months ago

It seems to ignore the attention_mask used in DiT forward function.

  File "/mnt/workspace/Text-to-Video/Open-Sora-Plan/opensora/models/diffusion/diffusion/respace.py", line 130, in __call__
    return self.model(x, new_ts, **kwargs)
TypeError: forward() missing 1 required positional argument: 'attention_mask'

https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/f1542802351a3df5c9c66732db2d265d9e49c525/opensora/sample/sample.py#L80

Fixed that.

xinyuxiao commented 4 months ago

if you have inference results, how is their quality, can you show some cases?

LinB203 commented 3 months ago

if you have inference results, how is their quality, can you show some cases?

See https://github.com/PKU-YuanGroup/Open-Sora-Plan/tree/main?tab=readme-ov-file#sampling