Closed anholt closed 8 years ago
Is this the same unknown/unimplemented that DDX/DDY is required to support fixed function OpenGL applications that implicit uses accurate derivatives if they enable mipmapped or anisotropic texture fetches? Reported in the following Raspberry Pi forum report and downstream Raspberry Pi Raspbian user-land bug report: https://www.raspberrypi.org/forums/viewtopic.php?p=917912#p917912 https://github.com/raspberrypi/userland/issues/289
When using this driver and trying to run a java program (in my specific case, BlueJ), using the opengl pipeline:
java -Dsun.java2d.opengl=true -Dawt.useSystemAAFontSettings=on -cp "/usr/share/bluej/bluej.jar:/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/lib/tools.jar" bluej.Boot
I get the following message:
unknown NIR ALU inst: vec1 ssa_371 = fddx ssa_316 Aborted
DDX/DDY can be supported by adding the missing implementation for nir_op_fddx and nir_op_fddy into https://github.com/anholt/mesa/blob/master/src/gallium/drivers/vc4/vc4_program.c#L819-L1087
static void ntq_emit_alu(struct vc4_compile c, nir_alu_instr instr) { ... switch (instr->op) { ... case nir_op_fddx: ... break; case nir_op_fddy: ... break;
I need to read up/research a bit more and will then try make a fix for this issue.
@anholt do you know where public documentation is available that describe this MUL output rotation support for the VC4? Do you mean a "simple" affine multiplication transformation with a fixed rotation matrix? Do you have other documentation that would help someone dipping its toes to understand accurate derivatives DDX/DDY better?
This is probably one of the harder things to start with. It's the same 3D documentation that was released in 2014, though. http://www.broadcom.com/docs/support/videocore/VideoCoreIV-AG100-R.pdf
I hope this hard problem can be solved by collecting and study all currently written down knowledge about this subject:
The OpenGL GLSL specification: https://www.opengl.org/sdk/docs/man4/html/dFdx.xhtml https://www.opengl.org/registry/doc/GLSLangSpec.4.40.pdf 8.13.1 <- page 186 The OpenGL ES GLSL specification: https://www.khronos.org/registry/gles/extensions/OES/OES_standard_derivatives.txt
Derivatives may be computationally expensive and/or numerically unstable.
Therefore, an OpenGL ES implementation may approximate the true derivatives by using a fast but not entirely accurate derivative computation.The expected behavior of a derivative is specified using forward/backward differencing. Forward differencing: F(x+dx) - F(x) is approximately equal to dFdx(x).dx 1a dFdx(x) is approximately equal to F(x+dx) - F(x) 1b -------------- dx Backward differencing: F(x-dx) - F(x) is approximately equal to -dFdx(x).dx 2a dFdx(x) is approximately equal to F(x) - F(x-dx) 2b -------------- dx With single-sample rasterization, dx <= 1.0 in equations 1b and 2b. For multi-sample rasterization, dx < 2.0 in equations 1b and 2b. dFdy is approximated similarly, with y replacing x. An OpenGL ES implementation may use the above or other methods to perform the calculation, subject to the following conditions: 1. The method may use piecewise linear approximations. Such linear approximations imply that higher order derivatives, dFdx(dFdx(x)) and above, are undefined. 2. The method may assume that the function evaluated is continuous. Therefore derivatives within the body of a non-uniform conditional are undefined. 3. The method may differ per fragment, subject to the constraint that the method may vary by window coordinates, not screen coordinates. The invariance requirement described in section 3.1 of the OpenGL ES 2.0 specification is relaxed for derivative calculations, because the method may be a function of fragment location. Other properties that are desirable, but not required, are: 4. Functions should be evaluated within the interior of a primitive (interpolated, not extrapolated). 5. Functions for dFdx should be evaluated while holding y constant. Functions for dFdy should be evaluated while holding x constant. However, mixed higher order derivatives, like dFdx(dFdy(y)) and dFdy(dFdx(x)) are undefined. 6. Derivatives of constant arguments should be 0. In some implementations, varying degrees of derivative accuracy may be obtained by providing GL hints (section 5.6 of the OpenGL ES 2.0 specification), allowing a user to make an image quality versus speed trade off.
This blog post confirm that a fixed function OpenGL driver will use DDX/DDY when mipmap is enabled. http://hacksoflife.blogspot.se/2011/01/derivatives-i-discontinuities-and.html
How does OpenGL know what mipmap level to use when you sample a texture in your GLSL shader with texture2D? The answer is that this:
texture2D(my_texture,uv);
actually does something like this:
texture2DGrad(my_texture,uv,dFdx(uv),dFdy(uv));
... Where Do Derivatives Come From?
The GLSL derivative functions are usually implemented by differencing - that is, the GPU takes a block of 2x2 pixels and differences the variable or expression passed to dFdx and dFdy, to calculate an 'approximate' derivative. Many GPUs rasterize 2x2 clusters of pixels at a time, with the shader instructions for the four pixels run in lock-step, so the hardware can be set up to efficiently "cross" the four texels to find our derivatives.
Both the OpenGL and OpenGL ES specification allows us to implement DDX/DDY with fast approximations
such as using forward difference or backward difference. Math to the rescue! http://mathworld.wolfram.com/ForwardDifference.html http://mathworld.wolfram.com/BackwardDifference.html
or other hacks such as the "cross" the four texels from a block of 2x2 texels
or... other known math/hacks?
I wrote a simple/dirty implementation/hack for ddx/ddy support, the test cases get through, some simple tests from me, too.
Attached is a patchset, maybe someone with more experience can look through it. (Especially the constraints (reads only from r0-r5 and last write not to the write register))
Principle: Forward and backward differencing within the quad. What is needed for that: We need to know where in the quad we are, the patch uses the "load imm per-elmt unsigned" instruction for that, this may be substituted by a read from REG_A_ELEMENT_NUMBER. Then we rotate the variable forward and backward. Because we know if we need forward or backward differencing for our pixel (see previous sentence), we substract our value from one of the rotated values.
Sorry for the delay in replying -- I've been off in kernel land.
This is really cool! Thanks for playing with it. You could do something like QOP_FRAG_REVFLAG for getting at the special register, but I'd actually been working on a series recently to make regs like that be QFILE* definitions like we do for VPM reads. Let me clean that one up and push it.
@anholt Saw https://github.com/anholt/mesa/tree/vc4-derivs Very keen to give this a test on my hardware and get the patches reviewed on the mailing list!
Landed:
commit 00c72acba5a98965622000d949b6835f28a9d71a Author: Eric Anholt eric@anholt.net Date: Thu Aug 25 12:32:19 2016 -0700
vc4: Add support for fddx/fddy
Based vaguely on a patch by jonasarrow on github.
Many thanks to @jonasarrow -- while I ended up mostly rewriting it into the final patch series, it's their basic idea.
This might be doable with the MUL output rotation support