BradLarson / GPUImage3

GPUImage 3 is a BSD-licensed Swift framework for GPU-accelerated video and image processing using Metal.
BSD 3-Clause "New" or "Revised" License
2.68k stars 331 forks source link

Step and smoothStep #39

Open jonkai opened 5 years ago

jonkai commented 5 years ago

Hi Brad, I noticed a "bug" in Apple's metal, that appears to only effect iPhoneXs (probably iPhoneXr) and possibly iPhoneX, (those newer CPU/GPU chips) (I'm sure Apple will not call it a bug, but when you get two devices with different outputs, then something needs fixed)

if you use smoothstep or step, (and possibly many other math functions) in a metal shader, and you use this with "half" type rather than a "float" type, you will get a far different result than if you are say running the code in an iPhone7+ or iPhone6s.

to fix this you can change all smoothstep/step functions form say this below: which you have in some shaders:

half blendValue = smoothstep(half(uniform.thresholdSensitivity), half(uniform.thresholdSensitivity + uniform.smoothing), distance(half2(Cr, Cb), half2(maskCr, maskCb)));

to this:
float blendValue = smoothstep(float(uniform.thresholdSensitivity), float(uniform.thresholdSensitivity + uniform.smoothing), distance(float2(Cr, Cb), float2(maskCr, maskCb)));

and the results are about the same in the iPhone7+ and the iPhoneXs.

BradLarson commented 5 years ago

Interesting. Could that possibly be the cause of the issue reported here?

https://github.com/BradLarson/GPUImage3/issues/33

I briefly tried to track that one down, without success. Neither Janie nor I have had much free time lately, so we're way behind on this, but I really want to get this going in the next few weeks.

jonkai commented 5 years ago

hmm, thought my comment landed here, here it is again: very possible that is the issue, could even be in more than smoothstep/step, but so far those were the only places I noticed so far.

jonkai commented 5 years ago

I did not check to see if it only happens if the input numbers are large or if the result would be about the same if the inputs are in a more normal range that a "half" could use, since most all of the time those can vary wildly in what I was using it for.