Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.72k stars 179 forks source link

fvd evaluation error on the sky dataset #117

Closed yizenghan closed 2 months ago

yizenghan commented 3 months ago

Hi, now my evaluation on other datasets is normal. However, when I run the same prcocess:

python tools/calc_metrics_for_dataset.py --real_data_path path/to/sky_timelapse/sky_train/ --fake_data_path videos/sky/sky_baseline --mirror 1 --resolution 256 --metrics fvd2048_16f --gpus 1 --verbose True --use_cache 0,

it raises error: AssertionError: Video directories should be inside the root dir. 08ug3bzhV8Y is not.

I checked the organization of sky_timelapse/sky_train/. I found that each folder contains one or multiple subfolder(s) as below:

How can I normally test the fvd values?

maxin-cn commented 3 months ago

The dataset structure should follow, https://github.com/Vchitect/Latte/blob/main/docs/datasets_evaluation.md#dataset-structure

Root/video1/frame_xxx.jpg Root/video2/frame_xxx.jpg

yizenghan commented 3 months ago

I downloaded the dataset from your provided link https://huggingface.co/datasets/maxin-cn/SkyTimelapse/tree/main. Should I remove the second level and convert the structure to this:

maxin-cn commented 3 months ago

The script for calculating FVD requires the format of the data set as I just mentioned to you, so you need to re-shoot the format of the data set.

yizenghan commented 2 months ago

Hi, I have re-organized the sky dataset into the right format:

Now it runs without reporting errors. However, it comes out high FVD values around 210+, which is much higher than the reported value in the paper (<60). My script is as follows:

echo "Start sampling sky_baseline..."
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 --master_port=29536 sample/sample_ddp_baseline.py \
--config configs/sky/sky_sample.yaml \
--ckpt  ckpts/skytimelapse.pt \
--save_video_path videos/sky/sky_baseline_videos/ 

echo "Start converting sky_baseline..."
python tools/convert_videos_to_frames.py \
-s videos/sky/sky_baseline_videos \
-t videos/sky/sky_baseline --target_size 256 --video_ext mp4

CUDA_VISIBLE_DEVICES=0 python tools/calc_metrics_for_dataset.py  --real_data_path /path/to/sky_timelapse/sky_train_fvd  --fake_data_path videos/sky/sky_baseline \
    --mirror 1 --resolution 256 --metrics fvd2048_16f  --gpus 1 --verbose True --use_cache 0.

The generated videos seem normal.

maxin-cn commented 2 months ago

The original sky dataset does not have 256*256 resolution. Could you let me know if you resized the sky dataset in advance for evaluation?

yizenghan commented 2 months ago

No I did not. Could you please share the full pipeline for evaluation & training on this dataset?

maxin-cn commented 2 months ago

You can use this for resize https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/tools/convert_videos_to_frames.py#L95C28-L95C39

As for training, Latte does center_crop_resize on the fly: https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/datasets/sky_datasets.py#L92

yizenghan commented 2 months ago

Hi, I found that the provided sky dataset only contains jpg files. How should I use this python code? Could you share the usage script or the processed dataset for evaluation?

maxin-cn commented 2 months ago

Finding the script for previous resized images is currently a bit difficult. You can write your own multiprocess script to center crop and resize each image.

github-actions[bot] commented 2 months ago

Hi There! 👋

This issue has been marked as stale due to inactivity for 7 days.

We would like to inquire if you still have the same problem or if it has been resolved.

If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.

We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏

yizenghan commented 2 months ago

Center-cropping + resizing seem to yield reasonable results. Thanks.