Open aravindhv10 opened 2 months ago
Since we are anyway not training the VAE or any of the text encoders, we can cache the VAE and text embedding latents, this leads to big speed ups and reduction of memory usage. I have made a crude implementation here:
https://github.com/aravindhv10/x-flux/blob/aravind_prodigy_dataset/make_latent.py#L165
Using this requires changes for data loader during training:
https://github.com/aravindhv10/x-flux/blob/aravind_prodigy_dataset/image_datasets/dataset.py#L106
We can definitely implement these better.
it's great, maybe you can make a merge request
Since we are anyway not training the VAE or any of the text encoders, we can cache the VAE and text embedding latents, this leads to big speed ups and reduction of memory usage. I have made a crude implementation here:
https://github.com/aravindhv10/x-flux/blob/aravind_prodigy_dataset/make_latent.py#L165
Using this requires changes for data loader during training:
https://github.com/aravindhv10/x-flux/blob/aravind_prodigy_dataset/image_datasets/dataset.py#L106
We can definitely implement these better.