apple / ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
MIT License
16.66k stars 921 forks source link

image2image trace traps on generateImages with 320x512 model #128

Open pj4533 opened 1 year ago

pj4533 commented 1 year ago

I converted a model using:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-safety-checker --bundle-resources-for-swift-cli --latent-w 40 --latent-h 64 --attention-implementation ORIGINAL --model-version stabilityai/stable-diffusion-2-base -o model_output

That should give me a model for 320x512 images. I verified text2Image works fine, and did give me a 320x512 output image. (I changed to cpuAndGPU, cause of the ORIGINAL restriction for other sizes)

However, if I give a 320x512 startingImage, to do image2image, I get a trace trap.

Did I miss something?

edavidk7 commented 1 year ago

Same issue with StableDiffusion 2.1 base, precompiled from HuggingFace, using SplitEinsum on all compute devices. Trace traps even feeding a previously generated image from text2image.

swift run StableDiffusionSample --image photo3.jpg "An infinite blackboard" --resource-path /Users/davidkorcak/Documents/applediffusion/models/coreml-stable-diffusion-2-1-base/split_einsum/compiled --compute-units all --seed 93 --output-path ./
Building for debugging...
Build complete! (0.12s)
Loading resources and creating pipeline
(Note: This can take a while the first time using these resources)
Sampling ...
StableDiffusion/Encoder.swift:96: Fatal error: Unexpectedly found nil while unwrapping an Optional value
[1]    28368 trace trap  swift run StableDiffusionSample --image photo3.jpg  --resource-path   all  93
pj4533 commented 1 year ago

I stopped pursuing other aspect ratios for now. I just use 512x512 for processing (putting my image centered if it's 9:16), then uprezzing the output with a second pass, and cropping out the sides as a last step. Works good for image2image video processing, not so great for generative stuff as it starts to spread outside the 9:16 middle.

edavidk7 commented 1 year ago

I tried backfeeding it a 512 x 512 previously generated image, it still trace traps. Seems like there is some issue in the encoder code. I tried printing out the shape of the encoded image directly and it seems to produce correct shape. It fails at line 96 of Encoder.swift