Open lamfan opened 11 months ago
While the initial load time is much longer for split_einsum on NE, the memory footprint is supposed to be much lower.
I think other people are having problems with this stuff on iPhone 12s though, if you look through the issues here.
While the initial load time is much longer for split_einsum on NE, the memory footprint is supposed to be much lower.
I think other people are having problems with this stuff on iPhone 12s though, if you look through the issues here.
Ok I try to load split_einsum model using cpuAndNeuralEngine for configuration on my iPhone12 Pro Max
after changing coreml-stable-diffusion-v1-5-palettized_split_einsum_v2_compiled.zip model and set config.computeUnits = .cpuAndNeuralEngine, it take 4xx sec to load all of the model on my iPhone 12, but it crash again on generateImages.
After more testing, I find something interesting, when I use split_einsum model on config.computeUnits = .cpuAndGPU, it success create an image!
it take all memory to run it, if I try to generate other image, it will crash.....
Here are all of my configurations, any improvement?
prompt: "a photo of an astronaut riding a horse on mars";
negativePrompt: "";
startingImage: nil;
strength: 1.0;
refinerStart: 0.8;
imageCount: 1;
stepCount: 15;
seed: 0;
guidanceScale: 7.5;
controlNetInputs: [];
disableSafety: true;
useDenoisedIntermediates: false;
schedulerType: StableDiffusion.StableDiffusionScheduler.pndmScheduler;
schedulerTimestepSpacing: StableDiffusion.TimeStepSpacing.linspace;
rngType: StableDiffusion.StableDiffusionRNG.numpyRNG;
encoderScaleFactor: 0.18215;
decoderScaleFactor: 0.18215;
originalSize: 512.0;
cropsCoordsTopLeft: 0.0;
targetSize: 512.0;
aestheticScore: 6.0;
negativeAestheticScore: 2.5
While using the Neural Engine, did you set the reduceMemory-flag in the StableDiffusion Pipeline?
Sadly, getting it to work reliably is incredibly tricky and highly depends on the system state in my experience.
Yes, I had set it to true, it just unload the resource when it is done in ResourceManaging.swift
func prewarmResources() throws {
try loadResources()
unloadResources()
}
It is running ok when init all the models on StableDiffusionPipeline, the memory is running stable. but when I start to generateImages, the memory is run out on my 6G memory iPhone, it will crash..... Is that we can't deploy it on 6GB iPhone or can we create a lite models for less memory? I download a app call "Draw Things" on App Store, it can run on my 6G memory iPhone, does it use coreml to do it?
In my understanding, DrawThings uses MPSGraph instead of CoreML in order to work around its limitations. Here is a nice blog post of the developer – he has put considerable effort in it: https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-model-that-can-draw-everything-in-your-pocket/.
You should be able to run 6 bit palletised models on 6 GB devices on GPU or ANE, so I think you do something wrong. I would try the GPU option and use Instruments to profile the memory usage to look out for any massive spikes. These give you a hint of where you are running out of memory. Alternatively, you could try out the hugging face app for iOS to see if that is working for you to compare your code to theirs: https://github.com/huggingface/swift-coreml-diffusers
The prewarm part when using the reduceMemory option seems to be a dirty hack. The reason behind this seems to be the ANE compiler: When you load a model for the first time, the ANECompilerService is triggered to compile a model for the ANE. This uses considerable memory. To make sure that it is triggered before actually generating an image, you load a model to make sure that it is compiled – and unload it to free up memory again. Apple does not provide a method of determining if a model has already been compiled for the ANE.
Thank you so much for the reply! There's a lot of work to be done. I will try to compare the differences between my app and the Hugging Face app for iOS, as well as explore the ANE compiler." @MenKuch do you use Team or Messenger? May I add you?
Hi.
Here are my observations.
I think that Core ML allocates memory as "Other Processes" when using Neural Engine. So using Neural Engine may have no advantage about memory usage.
Since processes of ANE Core ML and unloadResources are different, the app tries to allocate safeChecker before ANE Core ML process deallocates UNet and others. I think that this is the main cause of the crash. You need wait until ANE Core ML process finishes deallocation.
From other experience.
Core ML has an internal API doUnloadModel:options:qos:error:. (I saw it in warning message.) The timing for Core ML to call it, is quite unpredictable. Core ML may keep memory as cache for reuse.
If you can call it, this issue may be solved.
The doUnloadModel method is a great find. Sadly, we cannot use it at the App Store since private methods are forbidden.
Eveytime when it start StableDiffusionPipeline.generateImages, it run out of memory and crash the app.
Device: iPhone 12 Pro Max 6GB memory on iOs17 Model: coreml-stable-diffusion-v1-5-palettized_original_compiled.zip(6-bit quantized models)
I also add Increased Memory Limit to 'YES' in Entitlements File.