[Feat]: Latent caching approach is too slow for very large datasets. Suggesting a smarter batched approach

Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.

GNU Affero General Public License v3.0

1.81k stars 154 forks source link

Describe your use-case.

Use case is caching 1 million images to latents. In smaller datasets, typically the first pass goes slow, the second pass is faster, and the third pass is super fast, but when trying to train on 1 million images, the it/s stayed mostly the same for phase 2 and phase 3 caching.

Specifically, it reported 55it/s, 65it/s, and then 110

What would you like to see as a solution?

PS: search turned up this old ticket; Latent caching too slow #181

So I'm guessing there have been some optimizations this year, but not enough for large scale use.

I'm guessing what needs to be done, is that when the number of latents will overflow VRAM, then a different caching strategy needs to be done. Perhaps:

calculate how many latents will actually fit into VRAM, taking batch size into account
start running the standard cache phase 1 steps
"oops VRAM is close to full". Start doing phase 2 and phase 3 caching with what we have so far
return to the next untouched dataset image, after where we stopped in the prior step.

Have you considered alternatives? List them here.

No response

Nerogar / OneTrainer