Closed yizenghan closed 2 months ago
The dataset structure should follow, https://github.com/Vchitect/Latte/blob/main/docs/datasets_evaluation.md#dataset-structure
Root/video1/frame_xxx.jpg Root/video2/frame_xxx.jpg
I downloaded the dataset from your provided link https://huggingface.co/datasets/maxin-cn/SkyTimelapse/tree/main. Should I remove the second level and convert the structure to this:
The script for calculating FVD requires the format of the data set as I just mentioned to you, so you need to re-shoot the format of the data set.
Hi, I have re-organized the sky dataset into the right format:
Now it runs without reporting errors. However, it comes out high FVD values around 210+, which is much higher than the reported value in the paper (<60). My script is as follows:
echo "Start sampling sky_baseline..."
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 --master_port=29536 sample/sample_ddp_baseline.py \
--config configs/sky/sky_sample.yaml \
--ckpt ckpts/skytimelapse.pt \
--save_video_path videos/sky/sky_baseline_videos/
echo "Start converting sky_baseline..."
python tools/convert_videos_to_frames.py \
-s videos/sky/sky_baseline_videos \
-t videos/sky/sky_baseline --target_size 256 --video_ext mp4
CUDA_VISIBLE_DEVICES=0 python tools/calc_metrics_for_dataset.py --real_data_path /path/to/sky_timelapse/sky_train_fvd --fake_data_path videos/sky/sky_baseline \
--mirror 1 --resolution 256 --metrics fvd2048_16f --gpus 1 --verbose True --use_cache 0.
The generated videos seem normal.
The original sky dataset does not have 256*256 resolution. Could you let me know if you resized the sky dataset in advance for evaluation?
No I did not. Could you please share the full pipeline for evaluation & training on this dataset?
You can use this for resize https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/tools/convert_videos_to_frames.py#L95C28-L95C39
As for training, Latte does center_crop_resize
on the fly: https://github.com/Vchitect/Latte/blob/c1650af18f41a73f043e9bfeb06f97abaf26530c/datasets/sky_datasets.py#L92
Hi, I found that the provided sky dataset only contains jpg files. How should I use this python code? Could you share the usage script or the processed dataset for evaluation?
Finding the script for previous resized images is currently a bit difficult. You can write your own multiprocess script to center crop and resize each image.
Hi There! 👋
This issue has been marked as stale due to inactivity for 7 days.
We would like to inquire if you still have the same problem or if it has been resolved.
If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.
We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏
Center-cropping + resizing seem to yield reasonable results. Thanks.
Hi, now my evaluation on other datasets is normal. However, when I run the same prcocess:
python tools/calc_metrics_for_dataset.py --real_data_path path/to/sky_timelapse/sky_train/ --fake_data_path videos/sky/sky_baseline --mirror 1 --resolution 256 --metrics fvd2048_16f --gpus 1 --verbose True --use_cache 0,
it raises error: AssertionError: Video directories should be inside the root dir. 08ug3bzhV8Y is not.
I checked the organization of sky_timelapse/sky_train/. I found that each folder contains one or multiple subfolder(s) as below:
How can I normally test the fvd values?