After a several successful inferences, eventually a new inference with the same settings will crash due to allocating too much memory

I have a very repeatable issue.

Where I can load the mode and then I can generate several images without issue. Then by after like 20 or so images, and I try to make another image with the same settings as all the ones that worked before, the program will crash.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 11.00 GiB total capacity; 8.21 GiB already allocated; 593.00 MiB free; 9.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

And then I need to restart my program / reload the model.

Is this expected behavior? why do the VRAM requirements go up over time? Shouldn't it be able to check how much is available before crashing the program?

Stability-AI / stablediffusion

After a several successful inferences, eventually a new inference with the same settings will crash due to allocating too much memory #341