Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

how to place and preprocess these datasets #78

Open renyuanzhe opened 1 month ago

renyuanzhe commented 1 month ago

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

maxin-cn commented 1 month ago

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

You can refer to https://github.com/Vchitect/Latte/issues/35. If it's still not clear, I'll give you a data structure.

renyuanzhe commented 1 month ago

FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

You can refer to #35. If it's still not clear, I'll give you a data structure.

should i write the preprocess code by myself, or is your repository contains the needed code?

renyuanzhe commented 1 month ago

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

maxin-cn commented 1 month ago

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
renyuanzhe commented 1 month ago

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png

thankyou, I have preprocessed the dataset.however I find that the input img size is 32 inthe code,this is different with 256 in the paper. Is there something wrong?

maxin-cn commented 1 month ago

and could you offer these datasets' structure in the project, I wonder if the structure is modified when preprocessing

All datasets have this dataset structure following their original structure, and no additional operations are required.

ROOT:
├── train
    ├── video1.mp4
    ├── video2.mp4
├── test
    ├── video1.mp4
    ├── video2.mp4

or

ROOT:
├── train
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png
├── test
    ├── video1.mp4
        ├── frame_0001.png
        ├── frame_0002.png
    ├── video2.mp4
        ├── frame_0001.png
        ├── frame_0002.png

thankyou, I have preprocessed the dataset.however I find that the input img size is 32 inthe code,this is different with 256 in the paper. Is there something wrong?

The encoder will downsample video from 256 to 32.