Memory „Leak“ when using cpu+gpu

MenKuch commented 12 months ago

My app allows the user to select different stable diffusion models, and I noticed a very strange issue concerning memory management. When using the StableDiffusionPipeline with cpu+gpu, around 1.5 GB of memory is not properly released after generateImages is called and the pipeline is released. When generating more images with a new StableDiffusionPipeline object, memory is reused and stays stable at around 1.5 GB after inference is complete. Everything, especially MLModels, are released properly. Guessing, MLModel seems to create a persistent cache.

Here is the problem: When using a different MLModel afterwards, another 1.5 GB is not released and stays resident. Using a third model, this totales to 4.5 GB of unreleased, persistent memory.

At first I thought that would be a bug in the StableDiffusionPipeline – but I was able to reproduce this behaviour in a very minimal objective-c sample without ARC:

MLArrayBatchProvider *batchProvider = [[MLArrayBatchProvider alloc] initWithFeatureProviderArray:@[<VALID FEATURE PROVIDER>]];

MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
config.computeUnits = MLComputeUnitsCPUAndGPU;

MLModel *model = [[MLModel modelWithContentsOfURL:[NSURL fileURLWithPath:<VALID PATH TO .mlmodelc SD 1.5 FILE>] configuration:config error:&error] retain];

id<MLBatchProvider> returnProvider = [model predictionsFromBatch:batchProvider error:&error];

[model release];
[config release];
[batchProvider release];

After running this minimal code, 1.5 GB of persistent memory is present that is not released during the lifetime of the app. This only happens on macOS 14(.1) Sonoma and on iOS 17(.1), but not on macOS 13 Ventura. On Ventura, everything works as expected and the memory is released when predictionsFromBatch: is done and the model is released.

Some observations:

This only happens using cpu+gpu, not cpu+ane (since the memory is allocated out of process) and not using cpu-only
It does not matter which stable diffusion model is used, I tried custom sd-derived models as well as the apple-provided sd 1.5 models
I reproduced the issue on MBP 16" M1 Max with macOS 14.1, iPhone 12 mini with iOS 17.0.3 and iPad Pro M2 with iPadOS 17.1
The memory that "leaks" are mostly huge malloc block of 100-500 MB of size OR IOSurfaces
This memory is allocated during predictionsFromBatch, not while loading the model
Loading and unloading a model does not leak memory – only when predictionsFromBatch is called, the huge memory chunk is allocated and never freed during the lifetime of the app

Does anybody have any clue what is going on? I highly suspect that I am missing something crucial, but my colleagues and me looked everywhere trying to find a method of releasing this leaked/cached memory.

MenKuch commented 12 months ago

To further illustrate the issue, here is an annotated screenshot of the memory graph:

Dalchrome commented 8 months ago

Did you find a solution to this?

MenKuch commented 8 months ago

Nope, no solution as of now. There is no way of clearing this cache that won't involve calling private methods.

Dalchrome commented 8 months ago

I’m just downloading macOS 14.4 b3 and crossing my fingers. BTW, were you getting the bug with 10-100mb extra memory left being used on every generation until crash? I did fix that one

MenKuch commented 8 months ago

No. As I said, the memory is only allocated once for each loaded MLModel. If you run inference over and over, memory stays constant. But if you want to load a SECOND model, you get double the memory usage.

Dalchrome commented 8 months ago

I got that too with the additional models, but also a smaller bump in ram, about 100mb, if I tried to unload model or set a new config, which is easy to avoid but still extra when there is already issues. I just upgraded to 14.4 b3 and things just got a lot worse. Some models are suddenly using 18-26 gb of ram in gpu mode, it seems to affect some models with 'original' encoding. Also cpu only mode is giving grey images. I only tested cpu only because there was meant to be a big performance boost in that mode for this os release. I'll submit feedback reports when I get a chance, and if that doesn't work I might end up using one of my TSIs.

apple / ml-stable-diffusion

Memory „Leak“ when using cpu+gpu #295