PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.25k stars 1k forks source link

[feat] Dataset embeddings/latents caching for more flexible experiments #136

Open kabachuha opened 6 months ago

kabachuha commented 6 months ago

Running VAEs and CLIP/T5 embedders is time expensive, and this cost scales up fast when multiple trainings are re-run.

As we keep these parts frozen and train only the diffusion model, we can decide to precompute them only once and store on drive in a form of raw tensors to be reused each training

See for a possible implementation

https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/train.py

https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/dataset.py

LinB203 commented 6 months ago

Good job, it is of benfit for training model with large dataset.