apple / ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
MIT License
16.79k stars 935 forks source link

Memory Crash on iPhone12 Pro Max #298

Open lamfan opened 11 months ago

lamfan commented 11 months ago

Eveytime when it start StableDiffusionPipeline.generateImages, it run out of memory and crash the app.

Device: iPhone 12 Pro Max 6GB memory on iOs17 Model: coreml-stable-diffusion-v1-5-palettized_original_compiled.zip(6-bit quantized models)

let config = MLModelConfiguration()
config.computeUnits = .cpuAndGPU <- if use '.cpuAndNeuralEngine' will take 3xx second to complete loading

var config = StableDiffusionPipeline.Configuration(prompt:"a photo of an astronaut riding a horse on mars")
config.negativePrompt = ""
config.imageCount = 1
config.stepCount = 15
config.seed = UInt32(1_000_000)
config.guidanceScale = 7.5
config.disableSafety = true
config.strength = 1.0
config.targetSize = 512
config.originalSize = 512

I also add Increased Memory Limit to 'YES' in Entitlements File.

jrittvo commented 11 months ago

While the initial load time is much longer for split_einsum on NE, the memory footprint is supposed to be much lower.

I think other people are having problems with this stuff on iPhone 12s though, if you look through the issues here.

lamfan commented 11 months ago

While the initial load time is much longer for split_einsum on NE, the memory footprint is supposed to be much lower.

I think other people are having problems with this stuff on iPhone 12s though, if you look through the issues here.

Ok I try to load split_einsum model using cpuAndNeuralEngine for configuration on my iPhone12 Pro Max

lamfan commented 11 months ago

after changing coreml-stable-diffusion-v1-5-palettized_split_einsum_v2_compiled.zip model and set config.computeUnits = .cpuAndNeuralEngine, it take 4xx sec to load all of the model on my iPhone 12, but it crash again on generateImages.

After more testing, I find something interesting, when I use split_einsum model on config.computeUnits = .cpuAndGPU, it success create an image!

image

it take all memory to run it, if I try to generate other image, it will crash.....

Here are all of my configurations, any improvement?

prompt: "a photo of an astronaut riding a horse on mars";
negativePrompt: "";
startingImage: nil;
strength: 1.0;
refinerStart: 0.8;
imageCount: 1;
stepCount: 15;
seed: 0;
guidanceScale: 7.5;
controlNetInputs: [];
disableSafety: true;
useDenoisedIntermediates: false;
schedulerType: StableDiffusion.StableDiffusionScheduler.pndmScheduler;
schedulerTimestepSpacing: StableDiffusion.TimeStepSpacing.linspace;
rngType: StableDiffusion.StableDiffusionRNG.numpyRNG;
encoderScaleFactor: 0.18215;
decoderScaleFactor: 0.18215;
originalSize: 512.0;
cropsCoordsTopLeft: 0.0;
targetSize: 512.0;
aestheticScore: 6.0;
negativeAestheticScore: 2.5
MenKuch commented 11 months ago

While using the Neural Engine, did you set the reduceMemory-flag in the StableDiffusion Pipeline?

Sadly, getting it to work reliably is incredibly tricky and highly depends on the system state in my experience.

lamfan commented 11 months ago

Yes, I had set it to true, it just unload the resource when it is done in ResourceManaging.swift

    func prewarmResources() throws {
        try loadResources()
        unloadResources()
    }

It is running ok when init all the models on StableDiffusionPipeline, the memory is running stable. but when I start to generateImages, the memory is run out on my 6G memory iPhone, it will crash..... Is that we can't deploy it on 6GB iPhone or can we create a lite models for less memory? I download a app call "Draw Things" on App Store, it can run on my 6G memory iPhone, does it use coreml to do it?

MenKuch commented 11 months ago

In my understanding, DrawThings uses MPSGraph instead of CoreML in order to work around its limitations. Here is a nice blog post of the developer – he has put considerable effort in it: https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-model-that-can-draw-everything-in-your-pocket/.

You should be able to run 6 bit palletised models on 6 GB devices on GPU or ANE, so I think you do something wrong. I would try the GPU option and use Instruments to profile the memory usage to look out for any massive spikes. These give you a hint of where you are running out of memory. Alternatively, you could try out the hugging face app for iOS to see if that is working for you to compare your code to theirs: https://github.com/huggingface/swift-coreml-diffusers

The prewarm part when using the reduceMemory option seems to be a dirty hack. The reason behind this seems to be the ANE compiler: When you load a model for the first time, the ANECompilerService is triggered to compile a model for the ANE. This uses considerable memory. To make sure that it is triggered before actually generating an image, you load a model to make sure that it is compiled – and unload it to free up memory again. Apple does not provide a method of determining if a model has already been compiled for the ANE.

lamfan commented 11 months ago

Thank you so much for the reply! There's a lot of work to be done. I will try to compare the differences between my app and the Hugging Face app for iOS, as well as explore the ANE compiler." @MenKuch do you use Team or Messenger? May I add you?

y-ich commented 9 months ago

Hi.

Here are my observations.

  1. When using .all or .cpuAndNeuralEngine, memory usage of the app is smaller than using .cpuAndGPU, but memory usage of "Other Processes" is larger.

I think that Core ML allocates memory as "Other Processes" when using Neural Engine. So using Neural Engine may have no advantage about memory usage.

  1. Loading safeChecker may crash the app when using Neural Engine.

Since processes of ANE Core ML and unloadResources are different, the app tries to allocate safeChecker before ANE Core ML process deallocates UNet and others. I think that this is the main cause of the crash. You need wait until ANE Core ML process finishes deallocation.

y-ich commented 9 months ago

From other experience.

Core ML has an internal API doUnloadModel:options:qos:error:. (I saw it in warning message.) The timing for Core ML to call it, is quite unpredictable. Core ML may keep memory as cache for reuse.

If you can call it, this issue may be solved.

MenKuch commented 9 months ago

The doUnloadModel method is a great find. Sadly, we cannot use it at the App Store since private methods are forbidden.