cassiebreviu / StableDiffusion

Inference Stable Diffusion with C# and ONNX Runtime
MIT License
218 stars 49 forks source link

Numeric instability in LMS #5

Closed jlami closed 1 year ago

jlami commented 1 year ago

Hi,

When I try to run with other num_inference_steps, for instance 30, I get very grainy results.

I am trying to compare the numbers that python is using and a lot of the calculations in C# are not lining up with it. Seems that numpy and torch use doubles internally for computations and only store floats in the resulting arrays. This might have some influence, but I have not tracked it down.

I am running this in dotnet 4.8, but I don't think that makes any difference.

A prompt of portrait photo of a old warrior chief with a seed of 0 gives me the following image:

sample

I don't know enough about schedulers to know if using another algorithm might fix this. Or if there is an easy way to test if the LMS will converge.

Maybe you can try higher inference steps and see if this is reproducible?

cassiebreviu commented 1 year ago

I played around with this and I think it comes down to working with the prompt, steps and seed to get the result you want. When I built the scheduler I performed tests and validated the same output and input in python and C# for each function formula in the scheduler to validate I had implemented the logic correctly and to debug issues when trying to get it to work.

I tested this prompt with a bigger random seed and 10 steps and get a very high res photo. Can you try changing your prompt to an "hd portrait photo of a old warrior chief" and see how the result changes? Play with the seed and steps to get the result you want.

See my testing below: Example: 10 Steps, random large seed, prompt "portrait photo of a old warrior chief" sample-10-steps-randomseed

Example: 29 Steps, Seed 0, prompt "portrait photo of a old warrior chief" sample-29-steps-old-seed0

Example: 29 Steps, Seed 0, prompt "hd portrait photo of a old warrior chief" sample-29steps-hd-seed0

I think old can also mean an old photo and therefore lower resolution.

Hope this helps! Thanks!

jlami commented 1 year ago

I think I found it, you store the derivative reversed here:

https://github.com/cassiebreviu/StableDiffusion/blob/fabfe663d68badc545f1635f39c450074b2998e5/StableDiffusion/LMSDiscreteScheduler.cs#L227-L230

You should only give the zip the reversed and store the normal one. I'll make a pull request with the fix

This is the image I get with the fix:

sample