Apple has just released SD for Mac

aleemb commented 1 year ago

Apple has just released SD for Mac with better performance: https://github.com/apple/ml-stable-diffusion

arunavo4 commented 1 year ago

Requires macOS 13.1, which is currently in beta 4. Release Date Mid- December

pressreset commented 1 year ago

From my understanding, only the Swift libs req 13.1 to integrate SD in CoreML into your Swift app. DBee uses Electron as the frontend, and Python as it's backend, so it should really just be a matter of converting models to CoreML using the torch2coreml conversion and then changing how diffusionbee_backend.py is implemented.

Gitterman69 commented 1 year ago

https://github.com/apple/ml-stable-diffusion

pressreset commented 1 year ago

Just confirmed.

It will only build on 13.1, even /w the Python distro because the CoreML model versions it uses are V7, not V6.

RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".

pressreset commented 1 year ago

Here is the good news/bad news.

Good news:

Swift implementation loading times for the model are reduced to 2 seconds.
8GB M1 devices/iOS devices are included in support.
Memory pressure is reduced to around 3GB for CoreML implementation.
It's significantly faster.

Bad news:

It doesn't run on anything but 13.1+.
Someone will need to fork Apple's ml-stable-diffusion repo and disable --safety-checker in the build.
Models will need to be converted to CoreML and we will need conversion tools for that.
There is currently no way to train a model in CoreML directly that I am aware of so models will need to be trained/merged in pytorch outside of CoreML model implementation then converted to CoreML models.
I am sure Apple will change their App store requirements so you need --safety-checker flag enabled.

This sucks for my use cases, because I am processing video frames and not only will it falsely flag things. Some form of nudity is just going to appear in people's films at certain points and I can't control what people are processing in my plugins/apps.

cpietsch commented 1 year ago

I am currently converting SD 2.0 on my macbook pro 16 m1. There is an initial warning !!! macOS 13.1 and newer or iOS/iPadOS 16.2 and newer is required for best performance !!! but so far it does it job...

1:42 seconds on macOs 13.0

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i sd2.0 -o out --compute-unit ALL --seed 93 --model-version stabilityai/stable-diffusion-2-base
WARNING:coremltools:Torch version 1.13.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 12 files: 100%|████████████████████████████████████████| 12/12 [00:00<00:00, 14669.67it/s]
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from sd2.0
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 15.3 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 118.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading sd2.0/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 5.4 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [01:42<00:00,  2.02s/it]
INFO:__main__:Saving generated image to out/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png

vicento commented 1 year ago

Apple has already converted stable models 👍 in core ML models https://huggingface.co/apple

if you want to convert your custom model

Converting Models to Core ML

Click to expand Step 1: Create a Python environment and install dependencies:

conda create -n coreml_stable_diffusion python=3.8 -y
conda activate coreml_stable_diffusion
cd /path/to/cloned/ml-stable-diffusion/repository
pip install -e .

Step 2: Log in to or register for your Hugging Face account, generate a User Access Token and use this token to set up Hugging Face API access by running huggingface-cli login in a Terminal window.

Step 3: Navigate to the version of Stable Diffusion that you would like to use on Hugging Face Hub and accept its Terms of Use. The default model version is CompVis/stable-diffusion-v1-4. The model version may be changed by the user as described in the next step.

Step 4: Execute the following command from the Terminal to generate Core ML model files (.mlpackage)

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o <output-mlpackages-directory> WARNING: This command will download several GB worth of PyTorch checkpoints from Hugging Face.

This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (.mlpackage) and saved into the specified . Some additional notable arguments:

--model-version: The model version defaults to CompVis/stable-diffusion-v1-4. Developers may specify other versions that are available on Hugging Face Hub, e.g. stabilityai/stable-diffusion-2-base & runwayml/stable-diffusion-v1-5.

--bundle-resources-for-swift-cli: Compiles all 4 models and bundles them along with necessary resources for text tokenization into /Resources which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline.

--chunk-unet: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is required for ANE deployment on iOS and iPadOS. This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only. Chunking is for on-device deployment with Swift only.

--attention-implementation: Defaults to SPLIT_EINSUM which is the implementation described in Deploying Transformers on the Apple Neural Engine. --attention-implementation ORIGINAL will switch to an alternative that should be used for non-ANE deployment. Please refer to the Performance Benchmark section for further guidance.

--check-output-correctness: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.

7k50 commented 1 year ago

Please let us know if/when/how Apple's SD will/can be implemented in DiffusionBee. I'm assuming this might hopefully happen sooner or later?

pressreset commented 1 year ago

@vicento I've already done the build/conversion and set up a Swift project with a basic prompt on 13.1. The Apple provided library only converts the default SD models. It will download them into ~/.cache/HuggingFace automatically and then convert. It takes about 5m to convert on an M1 16gb.

gingerbeardman commented 1 year ago

How much faster is this Apple version?

pressreset commented 1 year ago

@gingerbeardman The OpenML Swift package takes between 2-3 seconds to load the model when in OpenML format. The Python package takes anywhere from 5-9 seconds and sometimes a little longer, however it is a significant speed increase over the existing torch MPS implementation. It also requires less memory (around 3GB), which results in lower memory pressure overall. Generation times can vary depending on the method chosen. Methods available are CPU/GPU, CPU/NE, ALL. ALL is not always as fast as CPU/GPU or CPU/NE depending on the operations being performed. Generation times are significantly reduced, to a fraction of the time required. Apple's repo can only generate 512x512 at the moment so it is up to whoever is forking their packages to implement different output sizes. As always any pixel increase requires more memory/generation time.

cpietsch commented 1 year ago

for now it looks like macOS 13.1+ is required for best performance. it still can run slowly on older versions

pressreset commented 1 year ago

If I had to guess It will probably be like the different builds that there are now for Intel vs M, or there will be 1 build that uses the best option. I doubt Divam is going to just tell everyone "You have to upgrade to 13.1.". Right now there is M build, M HQ build, and Intel build.

tenko23 commented 1 year ago

...an Intel build above a certain MacOS, that is ;) .

aajank commented 1 year ago

Since 13.1 is here.

godly-devotion commented 1 year ago

Okay so I was able to quickly cobble an app together using Apple's SD implementation. You do need the latest version of Ventura but performance does look promising (since its all native). https://github.com/godly-devotion/MochiDiffusion

juan9999 commented 1 year ago

mochi performance is about 11% faster when generating 8 images on a 32GB max

tenko23 commented 1 year ago

I didn't time it, but on a Mac Mini M1 with 16GB ram, it only took a fraction of the time for one image.

whosawhatsis commented 1 year ago

Using the coreml code running on the GPU of an M1 Pro/Max with lots of RAM seems to be a small but nice improvement on speed.

The big improvements, though, come when you use the neural engine for image generation. These improvements don't make it faster on these more powerful chips, but make the process much more efficient, using >1gb of RAM and >5W. This makes it a HUGE improvement for lower-power machines like base M1/M2 models, and even the A14, which has the same 16-core neural engine, should get similar performance.

Zabriskije commented 1 year ago

I’ve tested out speed difference on a MacBook Air M1 (8 cpu/gpu core, 8gb ram) with DiffusionBee and Mochi Diffusion (both single image, 30 steps, 512x512, Anything v3.0; Mochi with CPU/NE) to see the difference:

DiffusionBee: 40,46s;
Mochi Diffusion: 21,03s.

That’s a pretty nice jump.

divamgupta / diffusionbee-stable-diffusion-ui

Apple has just released SD for Mac #344