Closed Shudouken closed 6 years ago
It's likely that low floating number precision optimization is applied to texture indexing coordinates as well, which could mess up the samplings completely. Ideally I would like the optimization applied only to NNEDI3 convolution kernels (which is in full single float precision without the pragma), but the optionNV
pragma is a all-or-nothing deal.
Unfortunately with desktop OpenGL there is no working precision qualifier to specify precision for convolution kernels. A quick Google search shows that a recent OpenGL extension (AMD_gpu_shader_half_float
) might be useful but it's AMD only ATM.
For now I think I will just remove the optimization flag for nvidia card.
@bjin You could try making it a compute shader (even if you don't use shared memory, just add //!COMPUTE 8 8
or so); that would then allow you to do the indexing calculations entirely with integers, texelFetch and gl_GlobalInvocationID
@haasn Good idea. (#11 for reference purpose).
Turns out compute shader doesn't solve the problem. To actually address the issue, one need to dump and read the GLSL assembly in nvidia's proprietary format, to understand which actually happened. Considering it's an obsolete optimization flag with limited docs, I think I'm fine without it.
Turns out that both AMD_gpu_shader_half_float
and NV_gpu_shader5
provides support of float16_t
. There are also functions to pack and unpack two float from an int32 (packFloat2x16
and unpackFloat2x16
). On the other hand, there is also a standardized type qualifier precise
from ARB_gpu_shader5
which could help us workaround the texture coordinate issue, but it's less relevant compare to direct half float support.
I have a GeForce GTX 970 and those pragma lines seem to cause trouble for me.
raw.png
nnedi3-nns32-win8x4.hook from master
nnedi3-nns32-win8x4.hook with all occurences of '#pragma optionNV(fastprecision on)' removed