gyagp commented 8 years ago

Launch aquarium within browser (such as Chromium and FireFox), and change its view to some outside ones (such as outside 1 and outside original), we may see a lot of graphics artifacts. artifact-1 artifact-2

The possible root cause is as below: Some vectors, such as v_tangent, v_binormal and v_normal are interpolated from vertex shader. According to spec, these vectors have default precision highp in vertext shader. But after interpolated into fragment shader, float is designated to have default precision mediump. In some graphics driver, mediump is implemented as 16-bit float, while highp is 32-bit float, which will result in accuracy loss. Even with the function normalize() later on, there is a possibility 16-bit is not accurate enough, then 32-bit float is required.

I manually changed all the designated default precision in fragment shader to highp, and the graphics artifacts seem gone. no-artifact

I will send out a PR, and any comments are welcome!

greggman commented 8 years ago

It's mediump for a reason. Most mobile devices only support mediump. Changing it to highp will mean it no longer runs on phones.

What's your GPU/OS/Driver? Can you post the contents of "about:gpu" here?

gyagp commented 8 years ago

Thanks for the prompt comment! This is tested with Intel SKL GPU (GEN9) on Linux with proprietary driver (not Mesa). I also tried to repro this on some publicly available devices, but I couldn't find one today. I will get back to you if I can find such a device in following days.

However, I did more experiments on this case, and below are some findings.

The issue is caused by "outerRefractionMapFragmentShader". If I change the precision to highp in this shader, the artifacts are gone.
Moreover, it seems to me gl_FragColor causes the precision problem. The last line of this shader is as this: gl_FragColor = vec4(mix(skyColor, reflectColor, ((r + 0.3) * (reflection.r))).rgb, 1.0 - r); Is it possible gl_FragColor needs 32-bit precision rather than 16-bit? I qualified either skyColor or reflectColor with highp, the issue is gone.

You mentioned highp is not supported by some mobile devices. If you can confirm the issue, can you accept a fix as below in "outerRefractionMapFragmentShader"? -precision mediump float; +#ifdef GL_FRAGMENT_PRECISION_HIGH

precision highp float; +#else
precision mediump float; +#endif

gyagp commented 8 years ago

The fix seems to be wrongly formatted by github. I mean the change as this:

ifdef GL_FRAGMENT_PRECISION_HIGH

precision highp float;

else

precision mediump float;

endif

greggman commented 8 years ago

Yea, let's try that.

BTW: you can format code in github either by indenting 4 spaces

----+ Indent 4 spaces
    V
    code

or by using triple backticks

code

gyagp commented 8 years ago

Thanks for the tips to format code, which is very helpful! I tried some more devices at hand, but still couldn't find anyone to reproduce this. Please give me several days as I'm double checking with Intel gfx team to ensure it's not a driver issue.

gyagp commented 8 years ago

Sorry that it has been about a week since my last update. But in past days, I have been working hard on this to do more experiments and have more checks with our (Intel) graphics team. Now the root cause is almost clear, so I want to discuss with you about solution. First let me state the root cause:

Since Intel 9th generation GPU (such as in SkyLake), it has a hardware feature to support 16-bit float. This can be called as half-precision floating-point (https://en.wikipedia.org/wiki/Half-precision_floating-point_format and https://software.intel.com/en-us/articles/performance-benefits-of-half-precision-floats), which is also known as float16, half-float, FP16, etc. For convenience, let's use FP16 for following description.
FP16 can bring many benefits, especially for performance and power.
The root cause of this issue is due that we lose some precision for FP16 multiplication. Last time, I figured out function mix() is the reason for the artifacts. In below experiment, I take a step further to mimic the behavior of it. The definition and implementation in shader compiler of mix() is: mix(x, y, a) == x * (1 - a) + y * a == x + a * (y - x). Then I changed code as below (pseudo code):

highp float one = 1.0;
mediump float a = <valueA>;
mediump float x = <valueX>;
mediump float y = <valueY>;
mediump vec4 tmp1 = y - x;
mediump vec4 tmp2 = one * tmp1 * a; // good (A)
//mediump vec4 tmp2 = tmp1 * a * one; // bad (B)
//mediump vec4 tmp2 = tmp1 * a; // bad (C)
mediump vec4 mixResult = x + tmp2;

In above code, if I use (A), the result is good, and I got artifacts with (B) or (C). I think this proves FP16 multiplication causes the precision loss, and if we manually promote it to 32-bit (one * tmp1), the issue can be fixed.

I checked GLES spec, and it has a very loose requirement on float-point computation: (GLES 3.0.4, section 2.1.1, Floating-Point Computation) We require simply that numbers’ floatingpoint parts contain enough bits and that their exponent fields are large enough so that individual results of floating-point operations are accurate to about 1 part in 10^5. So our driver doesn't disobey the spec.
I had a thorough check with all 21 cases in this repo, and found this case is not the only one that has rendering issue. 3 other cases (dynamic-cubemap, fishtank and spacerocks) also have rendering issue with same root cause. [dynamic-cubemap] Issue: centre cube is not rendered Workaround:

--- a/dynamic-cubemap/dynamic-cubemap.html
+++ b/dynamic-cubemap/dynamic-cubemap.html
@@ -567,7 +567,8 @@ uniform samplerCube skybox;
uniform mat4 viewDirectionProjectionInverse;
varying vec4 v_position;
void main() {
vec4 t = viewDirectionProjectionInverse * v_position;
+ highp float one = 1.0;
+ vec4 t = one * viewDirectionProjectionInverse * v_position;

[fishtank] Issue: Background is more blurred, and has some strips. Workaround: The problem is in "refractSkyboxFragmentShader", and if we have a change as below, the display will be OK.

void main() {
vec4 fragment = viewProjectionInverse * v_position;
+ highp float one = 1.0;
+ vec4 fragment = one * viewProjectionInverse * v_position;

[spacerocks] Issue: Background should be colorful for whole screen. But now the background outside a circle is green. Workaround: The problem is in "skyboxFragmentShader", and if we change it as below, display will be normal.

void main() {
vec4 t = (viewDirectionProjectionInverse * v_position);
+ highp float one = 1.0;
+ vec4 t = (one * viewDirectionProjectionInverse * v_position);

Solution Proposal After many discussion with our graphics team, we agree the precision part is weak in GLES spec, and the fix can be either in driver or applications. We do realize there might be many applications outside that have this same issue. But at the same time, our graphics team still thinks FP16 is a critical feature for Intel GPU in terms of performance and power, and wants to have your opinion on the solution. I will upload a fix as I suggested last time, and we will appreciate any comments from you. Thanks a lot!

greggman commented 8 years ago

My thoughts would be if it breaks this it probably breaks lots of content around all over the place. Maybe @mrdoob, @kenrussell, @toji have some thoughts

mrdoob commented 8 years ago

The solution for this was to add some code to check what's the maximum precision the target platform supports.

https://github.com/mrdoob/three.js/blob/master/src/renderers/webgl/WebGLCapabilities.js#L3-L31

greggman commented 8 years ago

That's not the issue here AFAIK. Do Android and iOS claim highp or mediump? I thought they claimed meduimp and yet the code above works on both so in this case your detection wouldn't help. See above, the shader is already marked as mediump

greggman commented 8 years ago

Checking my Nexus 5 returns 10 for precision in mediump (vs 23 on my mac) and yet the sample runs with no issues

kenrussell commented 8 years ago

Also no issues on a Nexus 5X. Several Android devices use real mediump precision rather than emulating it with highp precision. It seems to me that something is wrong in Intel's driver. Maybe it needs to use higher precision for some of the built-in functions like mix(), or do operations like matrix multiplication in higher precision and then truncate the result back to medium precision.

gyagp commented 8 years ago

Thank you all for the comments! The Nexus5 and Nexus 5X have no problem with this, may simply because they prompt the precision during some operations, like mix() here. Just specific to this case, the implementation of mix(x, y, a) in Intel driver uses mediump, maximum precision of operands, in this case. We can easily workaround the issue by "multiplying highp float 1.0" to either x, y or a. But I think using mediump follows the spec well. Do you think it's driver's duty to prompt precision for some specific operations? You know, this would hurt general performance and power.

kenrussell commented 8 years ago

@gyagp I doubt the N5 and N5X are promoting the precision of the operands to highp. Rather, it seems they are doing the internal work of the mix() operator at higher precision, and then truncating the result back to mediump.

I suggest Intel do this. The current algorithm in Intel's driver is causing these rendering artifacts that aren't seen on other GPUs. The solution is not to adjust all content to use highp, but for Intel to change the behavior of its driver.

gyagp commented 8 years ago

@kenrussell What I wanted to express is same with you:) I already brought your valuable feedback to GFX team for discussion. Thanks a lot!

gyagp commented 8 years ago

Intel will handle this in driver. Thank you all for the feedback, and I will close the PR accordingly.

WebGLSamples / WebGLSamples.github.io

Graphics artifacts can be seen with aquarium #4

ifdef GL_FRAGMENT_PRECISION_HIGH

else

endif