TensorStack-AI / OnnxStack

C# Stable Diffusion using ONNX Runtime
Apache License 2.0
221 stars 33 forks source link

Vectorize most of TensorHelper #15

Closed jdluzen closed 1 year ago

jdluzen commented 1 year ago

Improve processing time by ~4%. Tested with LCM fp16. 512x512 with 6 steps taking 9xx ms on an AMD 6900XT. 🤯

saddam213 commented 1 year ago

Awesome, lots of optimizations begging to be done, so thank you for this

Also you going to be in for a lot of work fixing my typos, LOL :p

saddam213 commented 1 year ago

Might leave this one open a bit longer and use it to fix up some other optimizations

I started making a set of mutate functions in TensorHelper, might be time to get these in place, will save a few unneeded loops speeding things up a bit

saddam213 commented 1 year ago

A small tip if you haven't already found it, the OnnxStack.Console example 1 (StableDebug) will spit out a set of images for each scheduler with the same seed good for testing if any schedulers have changed dramatically as images should match after code changes

Those images are on the repos main page

jdluzen commented 1 year ago

Nice, I had been doing something similar with a specific seed and phrase. For this one, I was surprised it worked first try, so I went back and purposefully broke it to make sure 😅

saddam213 commented 1 year ago

Yeah, can be easy to break, but if all images come out different, but better then that's also worth investigation, because my original math could be wrong, im new to this, still getting my head around it all

But for the last 7 builds those images have been consistent, so I "feel" my math is correct

saddam213 commented 1 year ago

Some operations may be worth making their own specific methods

like (tensor1 * x1) + (tesnor2 / x1) which would be 3 loops could be made into one function perhaps

it would clean up the Scheduler code to, Like AddNoise shares the same calc across some of the Schedulers

saddam213 commented 1 year ago

Looks like we may have to change TensorHelper to be generic, so I will just grab these changes now and revisit optimizations another day :)