Problem with caching latents

ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

https://huggingface.co/docs/diffusers

Apache License 2.0

1.89k stars 509 forks source link

Problem with caching latents #228

Open shadowlocked opened 1 year ago

shadowlocked commented 1 year ago

Describe the bug

The caching of latents frequently stops, for me, at about 1,272 latents. At the same time, latent caching behavior is completely erratic. During one training yesterday, caching was so slow that it estimated 4 hours for 1,400 images (usually takes about 7 minutes).

When caching latents stalls the training, no error messages are produced. You simply return to Colab to find that it stopped, and never reached the end of the caching.

Reproduction

Try it yourself a few times - obviously this is independent of any user-specific settings or hardware. Apart from the instance data, the data being used (the reg images) and the settings are the same defaults that I have been using since early January.

Logs

No response

System Info

Irrelevant, relates to default Colab settings in the main Shivam Dreambooth Colab.

shadowlocked commented 1 year ago

On an A100, it is running. It is possible that the needs of upstream dependencies and other factors are now so much greater than late 2022 that it is no longer possible to run the script reliably on a T4. This increases cost 5-6x for paid users, and is beyond the reach of free users.

Leomn1234 commented 1 year ago

I always train on the A100 so I don't have to use 8bit-adam or gradient checkpointing, cost is CAD $16 with taxes for 100 units. The A100 uses about 13.08 an hour. I switch to standard GPU for inference.

You don't have to cache the latents, the huggingface train_dreamooth.py doesn't have that option. In this fork you can disable it by adding --not_cache_latents

shadowlocked commented 1 year ago

You don't have to cache the latents, the huggingface train_dreamooth.py doesn't have that option. In this fork you can disable it by adding --not_cache_latents.

What are the consequences of not caching the latents? If they're unnecessary, why are they still in the Colab?

jmaccall316 commented 1 year ago

Well the reason to cache images

You don't have to cache the latents, the huggingface train_dreamooth.py doesn't have that option. In this fork you can disable it by adding --not_cache_latents.

What are the consequences of not caching the latents? If they're unnecessary, why are they still in the Colab?

It is still going to learn the latents, but caching might make for a slight speed improvement and/or to secure them better than not caching, like if your image folders got deleted by mistake during training, your session might survive if your images are stored in cache memory. I wouldn't expect a big difference., not caching might actually work better. If you are having trouble caching the latents it could be worth to try without, and set your save intervals at a low number so you can monitor the results for yourself and reach your own conclusion.