Swift generation produces different style/quality images compared to other SD tools

kasima commented 1 year ago

I've been experimenting with image generation in Swift with the converted CoreML models. It seems to produce different style (and noticeably worse?) images than other Stable Diffusion tools for a given model version and set of generation parameters. The python CLI generation with the converted CoreML models seems to produce images that are in the same vicinity as the other tools.

I'm new to AI image generation space and would much appreciate any help with a few questions:

Does the last row of images (Swift CLI) seem like it's very different from the other rows to anyone else? It's somewhat subjective.
Why would the generation with Swift consistently produce different quality/style/coherence than all the other tools?
How can I get the Swift generation with the Neural Engine to produce similar quality images as the other Stable Diffusion tools?

Here's what I've been looking at:

Parameters

Model: a version of Stable Diffusion 1.5
Prompt: personification of Halloween holiday in the form of a cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes (Inspired by this image)
Steps: 30
Guidance: 10
Seed: random

Tool
DreamStudio
Google Colab
DiffusionBee (local)
[InvokeAI]() (local)
python CLI (local coreML)
swift CLI (local coreML)

Python CLI generation command

pre-converted model from huggingface

python -m python_coreml_stable_diffusion.pipeline --prompt "personification of Halloween holiday in the form of a cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes" -i /Users/kasima/src/huggingface/apple/coreml-stable-diffusion-v1-5/original/packages -o /Users/kasima/scratch --compute-unit ALL --model-version "runwayml/stable-diffusion-v1-5" --num-inference-steps 30 --guidance-scale 10

Swift CLI generation command

pre-converted model from huggingface

swift run StableDiffusionSample "personification of Halloween holiday in the form of cute girl with short hair and a villain's smile, cute hats, cute cheeks, unreal engine, highly detailed, artgerm digital illustration, woo tooth, studio ghibli, deviantart, sharp focus, artstation, by Alexei Vinogradov bakery, sweets, emerald eyes" --negative-prompt "" --resource-path /Users/kasima/src/huggingface/apple/coreml-stable-diffusion-v1-5/split_einsum/compiled/ --output-path /Users/kasima/scratch/swiftcli/comparison --step-count 30 --guidance-scale 10 --image-count 4

GuiyeC commented 1 year ago

@kasima maybe this is a silly question but I see no mention of the seeds used to generate these images, did you use the same to generate each column on these examples? and if you did, could you share them so we can try to replicate this results?

H1p3ri0n commented 1 year ago

The issue is that currently this CoreML implementation only supports 2 samplers. The normal SD tools all support other samplers, like Euler, etc, which I found produce great results.

We'll have to wait until apple implement other samplers in this codebase.

kasima commented 1 year ago

@GuiyeC – All the images were generated with random seeds (updated in the original post). The images in the columns aren't necessarily related to each other. Columns were used for formatting. However, it's an interesting idea to try to keep the seeds the same. Will try that when I get a chance.

@timevision – So the samplers/schedulers might have something to do with it? I believe at least a few of them are using the default PNDM scheduler (Google Colab, Python CLI, Swift CLI, and probably Diffusion Bee as well). Will confirm and try regenerating with the same scheduler.

apple / ml-stable-diffusion