Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.
GNU Affero General Public License v3.0
1.76k stars 148 forks source link

[Feat]: Latent caching approach is too slow for very large datasets. Suggesting a smarter batched approach #526

Open ppbrown opened 1 week ago

ppbrown commented 1 week ago

Describe your use-case.

Use case is caching 1 million images to latents. In smaller datasets, typically the first pass goes slow, the second pass is faster, and the third pass is super fast, but when trying to train on 1 million images, the it/s stayed mostly the same for phase 2 and phase 3 caching.

Specifically, it reported 55it/s, 65it/s, and then 110

What would you like to see as a solution?

PS: search turned up this old ticket; Latent caching too slow #181

So I'm guessing there have been some optimizations this year, but not enough for large scale use.

I'm guessing what needs to be done, is that when the number of latents will overflow VRAM, then a different caching strategy needs to be done. Perhaps:

  1. calculate how many latents will actually fit into VRAM, taking batch size into account
  2. start running the standard cache phase 1 steps
  3. "oops VRAM is close to full". Start doing phase 2 and phase 3 caching with what we have so far
  4. return to the next untouched dataset image, after where we stopped in the prior step.

Have you considered alternatives? List them here.

No response

ppbrown commented 6 days ago

Upon more reflection.. and looking at the actual cache area used... it occurs to me that this may not be a VRAM usage problem, but a cache directory problem. It's a FLAT directory? !!

I'm using concepts that have the "use subdirectories" flag enabled. One approach would be to use same directory structure that the source uses. For example, my 1 million concepts are split across directories like 0001/ 0002/ 0003/ This addresses the filesystem directory caching problem with too many files in the same directory. So, if caller is taking pains to have a subdirectory structure, maybe use the same thing?

or, just hardcode a caching scheme like google browser uses for its image caching directory.