Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.64k stars 171 forks source link

fvd evaluation error on the sky dataset #117

Open yizenghan opened 2 weeks ago

yizenghan commented 2 weeks ago

Hi, now my evaluation on other datasets is normal. However, when I run the same prcocess:

python tools/calc_metrics_for_dataset.py --real_data_path path/to/sky_timelapse/sky_train/ --fake_data_path videos/sky/sky_baseline --mirror 1 --resolution 256 --metrics fvd2048_16f --gpus 1 --verbose True --use_cache 0,

it raises error: AssertionError: Video directories should be inside the root dir. 08ug3bzhV8Y is not.

I checked the organization of sky_timelapse/sky_train/. I found that each folder contains one or multiple subfolder(s) as below:

How can I normally test the fvd values?

maxin-cn commented 2 weeks ago

The dataset structure should follow, https://github.com/Vchitect/Latte/blob/main/docs/datasets_evaluation.md#dataset-structure

Root/video1/frame_xxx.jpg Root/video2/frame_xxx.jpg

yizenghan commented 2 weeks ago

I downloaded the dataset from your provided link https://huggingface.co/datasets/maxin-cn/SkyTimelapse/tree/main. Should I remove the second level and convert the structure to this:

maxin-cn commented 2 weeks ago

The script for calculating FVD requires the format of the data set as I just mentioned to you, so you need to re-shoot the format of the data set.

yizenghan commented 1 week ago

Hi, I have re-organized the sky dataset into the right format:

Now it runs without reporting errors. However, it comes out high FVD values around 210+, which is much higher than the reported value in the paper (<60). My script is as follows:

echo "Start sampling sky_baseline..."
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 --master_port=29536 sample/sample_ddp_baseline.py \
--config configs/sky/sky_sample.yaml \
--ckpt  ckpts/skytimelapse.pt \
--save_video_path videos/sky/sky_baseline_videos/ 

echo "Start converting sky_baseline..."
python tools/convert_videos_to_frames.py \
-s videos/sky/sky_baseline_videos \
-t videos/sky/sky_baseline --target_size 256 --video_ext mp4

CUDA_VISIBLE_DEVICES=0 python tools/calc_metrics_for_dataset.py  --real_data_path /path/to/sky_timelapse/sky_train_fvd  --fake_data_path videos/sky/sky_baseline \
    --mirror 1 --resolution 256 --metrics fvd2048_16f  --gpus 1 --verbose True --use_cache 0.

The generated videos seem normal.

maxin-cn commented 1 week ago

The original sky dataset does not have 256*256 resolution. Could you let me know if you resized the sky dataset in advance for evaluation?

yizenghan commented 1 week ago

No I did not. Could you please share the full pipeline for evaluation & training on this dataset?

maxin-cn commented 1 week ago

You can use this for resize https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/tools/convert_videos_to_frames.py#L95C28-L95C39

As for training, Latte does center_crop_resize on the fly: https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/datasets/sky_datasets.py#L92

yizenghan commented 1 week ago

Hi, I found that the provided sky dataset only contains jpg files. How should I use this python code? Could you share the usage script or the processed dataset for evaluation?

maxin-cn commented 1 week ago

Finding the script for previous resized images is currently a bit difficult. You can write your own multiprocess script to center crop and resize each image.