NNEDI3: #pragma optionNV(fastprecision on) causes issues on some GPUs

bjin / mpv-prescalers

prescalers for mpv, as user shaders

GNU Lesser General Public License v3.0

355 stars 34 forks source link

NNEDI3: #pragma optionNV(fastprecision on) causes issues on some GPUs #19

Closed Shudouken closed 6 years ago

Shudouken commented 6 years ago

I have a GeForce GTX 970 and those pragma lines seem to cause trouble for me.

raw.png raw

nnedi3-nns32-win8x4.hook from master

mpv --no-config --window-scale=4 --vf=format=gray --opengl-shader=nnedi3-nns32-win8x4.hook --pause raw.png

fastprecision_on

nnedi3-nns32-win8x4.hook with all occurences of '#pragma optionNV(fastprecision on)' removed

mpv --no-config --window-scale=4 --vf=format=gray --opengl-shader=nnedi3-nns32-win8x4-nopragma.hook --pause raw.png

fastprecision_off

bjin commented 6 years ago

It's likely that low floating number precision optimization is applied to texture indexing coordinates as well, which could mess up the samplings completely. Ideally I would like the optimization applied only to NNEDI3 convolution kernels (which is in full single float precision without the pragma), but the optionNV pragma is a all-or-nothing deal.

Unfortunately with desktop OpenGL there is no working precision qualifier to specify precision for convolution kernels. A quick Google search shows that a recent OpenGL extension (AMD_gpu_shader_half_float) might be useful but it's AMD only ATM.

For now I think I will just remove the optimization flag for nvidia card.

haasn commented 6 years ago

@bjin You could try making it a compute shader (even if you don't use shared memory, just add //!COMPUTE 8 8 or so); that would then allow you to do the indexing calculations entirely with integers, texelFetch and gl_GlobalInvocationID

bjin commented 6 years ago

@haasn Good idea. (#11 for reference purpose).

bjin commented 6 years ago

Turns out compute shader doesn't solve the problem. To actually address the issue, one need to dump and read the GLSL assembly in nvidia's proprietary format, to understand which actually happened. Considering it's an obsolete optimization flag with limited docs, I think I'm fine without it.

bjin commented 6 years ago

Turns out that both AMD_gpu_shader_half_float and NV_gpu_shader5 provides support of float16_t. There are also functions to pack and unpack two float from an int32 (packFloat2x16 and unpackFloat2x16). On the other hand, there is also a standardized type qualifier precise from ARB_gpu_shader5 which could help us workaround the texture coordinate issue, but it's less relevant compare to direct half float support.