sdxl inference is too slow (0.02 step/sec) on m2 macbook air with macos14

0ihsan commented 1 year ago

using StableDiffusionCLI with computeUnits = .cpuAndGPU and this specific model: https://huggingface.co/apple/coreml-stable-diffusion-mixed-bit-palettization/tree/main/coreml-stable-diffusion-xl-base_mbp_4_50_palettized/compiled

$ ...
Loading resources and creating pipeline
(Note: This can take a while the first time using these resources)
Step 9 of 11  [mean: 0.02, median: 0.03, last 0.03] step/sec

finishes generating image. image looks not usable: major flaws. (that's another issue).

$ sw_vers
ProductName:            macOS
ProductVersion:         14.0
BuildVersion:           23A344

memory: 8gb

also tried:

computeUnits = .cpuAndNeuralEngine: ANECompilerService never finishes (waited 10+ mins) it's job and sampling never starts. need split_einsum version i assume.
computeUnits = .cpuOnly: gets killed, due to memory issues i guess.

atiorh commented 1 year ago

Hello @0ihsan, this looks like the same confusion point as #267 in that the model you pointed to uses --attention-implementation ORIGINAL which is compatible with --compute-units cpuAndGPU whereas the newly published coreml-stable-diffusion-xl-base-ios uses --attention-implementation SPLIT_EINSUM which is compatible with --compute-units cpuAndNeuralEngine. We are considering reorganizing the published model folder to make this clearer. (cc: @pcuenca)

TimYao18 commented 11 months ago

For your references. I test with DreamShaper XL1.0 on MacBook Air M2, 25 steps about 290 seconds. => 0.08 step/sec

apple / ml-stable-diffusion

sdxl inference is too slow (0.02 step/sec) on m2 macbook air with macos14 #271