Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Question: evaluate the FVD #80

Closed Alienge closed 1 month ago

Alienge commented 1 month ago

I follow issue https://github.com/Vchitect/Latte/issues/65#issuecomment-2043988865_ setting to calculate the FVD in sky dataset. And set frame_interval=3, but the FVD is higher than reported from paper.

{"results": {"fvd2048_16f": 233.66249157395492}, "metric": "fvd2048_16f", "total_time": 228.99126720428467, "total_time_str": "3m 49s", "num_gpus": 1, "snapshot_pkl": null, "timestamp": 171 6430907.0302753}

The dir of empty and "z6Yr7KDZhm0" are deleted. The original sky train dataset and generated dataset are organized a directory structured as:

dataset/
    video1/
        - frame1.jpg
        - frame2.jpg
        - ...
    video2/
        - frame1.jpg
        - frame2.jpg
        - ...
    ...

Could you kindly give some advices?

Addtionaly, which datasets R U used to train in your Repo? I found 2 versions sky_timelapse dataset. Origin dataset and sky_timelapse256.

The FVD seems to normal in sky_timelapse256.

{"results": {"fvd2048_16f": 49.99966394576661}, "metric": "fvd2048_16f", "total_time": 113.27880835533142, "total_time_str": "1m 53s", "num_gpus": 1, "snapshot_pkl": null, "timestamp": 1716 433444.7904737}

maxin-cn commented 1 month ago

I follow issue #65 (comment)_ setting to calculate the FVD in sky dataset. And set frame_interval=3, but the FVD is higher than reported from paper.

{"results": {"fvd2048_16f": 233.66249157395492}, "metric": "fvd2048_16f", "total_time": 228.99126720428467, "total_time_str": "3m 49s", "num_gpus": 1, "snapshot_pkl": null, "timestamp": 171 6430907.0302753}

The dir of empty and "z6Yr7KDZhm0" are deleted. The original sky train dataset and generated dataset are organized a directory structured as:

dataset/
    video1/
        - frame1.jpg
        - frame2.jpg
        - ...
    video2/
        - frame1.jpg
        - frame2.jpg
        - ...
    ...

Could you kindly give some advices?

Addtionaly, which datasets R U used to train in your Repo? I found 2 versions sky_timelapse dataset. Origin dataset and sky_timelapse256.

The FVD seems to normal in sky_timelapse256.

{"results": {"fvd2048_16f": 49.99966394576661}, "metric": "fvd2048_16f", "total_time": 113.27880835533142, "total_time_str": "1m 53s", "num_gpus": 1, "snapshot_pkl": null, "timestamp": 1716 433444.7904737}

Hi, thanks for your interest. I use the original dataset but resize it to 256 during the data preprocessing phase (see here https://github.com/Vchitect/Latte/blob/217fd51407dd780bc65330ec6d44b0a210e61971/datasets/__init__.py#L65). The evaluation test should be performed on the dataset after resize.

Alienge commented 1 month ago

Thanks for your kindly reply. The work of Latte is elegant. I have known about the source of training data. Finally, I am confused about: 1) the dataset of sky_timelapse256 is selected to evaluate the FVD. 2) resize original sky_timelapse dataset to 256 resolution to evaluate the FVD. Which choice?

huangjch526 commented 1 month ago

sky_timelapse256.

sky_timelapse256. where can I download this sky_timelapse256? Thanks.

Alienge commented 1 month ago

sky_timelapse256.

sky_timelapse256. where can I download this sky_timelapse256? Thanks.

Sorry for missing link. Here

maxin-cn commented 1 month ago

sky_timelapse256.

sky_timelapse256. where can I download this sky_timelapse256? Thanks.

Sorry for missing link. Here

Thanks for your kindly reply. The work of Latte is elegant. I have known about the source of training data. Finally, I am confused about:

  1. the dataset of sky_timelapse256 is selected to evaluate the FVD.
  2. resize original sky_timelapse dataset to 256 resolution to evaluate the FVD. Which choice?

Select 2. Yes, I download the sky dataset from this link.

Alienge commented 1 month ago

sky_timelapse256.

sky_timelapse256. where can I download this sky_timelapse256? Thanks.

Sorry for missing link. Here

Thanks for your kindly reply. The work of Latte is elegant. I have known about the source of training data. Finally, I am confused about:

  1. the dataset of sky_timelapse256 is selected to evaluate the FVD.
  2. resize original sky_timelapse dataset to 256 resolution to evaluate the FVD. Which choice?

Select 2. Yes, I download the sky dataset from this link.

Ok. Summary, Latte uses the official dataset from https://github.com/weixiong-ur/mdgan to train and evaluates FVD data from stylegan-v .