vc4: No support for DDX/DDY

anholt commented 8 years ago

This might be doable with the MUL output rotation support

xranby commented 8 years ago

Is this the same unknown/unimplemented that DDX/DDY is required to support fixed function OpenGL applications that implicit uses accurate derivatives if they enable mipmapped or anisotropic texture fetches? Reported in the following Raspberry Pi forum report and downstream Raspberry Pi Raspbian user-land bug report: https://www.raspberrypi.org/forums/viewtopic.php?p=917912#p917912 https://github.com/raspberrypi/userland/issues/289

When using this driver and trying to run a java program (in my specific case, BlueJ), using the opengl pipeline:

java -Dsun.java2d.opengl=true -Dawt.useSystemAAFontSettings=on -cp "/usr/share/bluej/bluej.jar:/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/lib/tools.jar" bluej.Boot

I get the following message:
unknown NIR ALU inst: vec1 ssa_371 = fddx ssa_316
Aborted

xranby commented 8 years ago

DDX/DDY can be supported by adding the missing implementation for nir_op_fddx and nir_op_fddy into https://github.com/anholt/mesa/blob/master/src/gallium/drivers/vc4/vc4_program.c#L819-L1087

static void ntq_emit_alu(struct vc4_compile c, nir_alu_instr instr) { ... switch (instr->op) { ... case nir_op_fddx: ... break; case nir_op_fddy: ... break;

I need to read up/research a bit more and will then try make a fix for this issue.

xranby commented 8 years ago

@anholt do you know where public documentation is available that describe this MUL output rotation support for the VC4? Do you mean a "simple" affine multiplication transformation with a fixed rotation matrix? Do you have other documentation that would help someone dipping its toes to understand accurate derivatives DDX/DDY better?

anholt commented 8 years ago

This is probably one of the harder things to start with. It's the same 3D documentation that was released in 2014, though. http://www.broadcom.com/docs/support/videocore/VideoCoreIV-AG100-R.pdf

xranby commented 8 years ago

I hope this hard problem can be solved by collecting and study all currently written down knowledge about this subject:

The OpenGL GLSL specification: https://www.opengl.org/sdk/docs/man4/html/dFdx.xhtml https://www.opengl.org/registry/doc/GLSLangSpec.4.40.pdf 8.13.1 <- page 186 The OpenGL ES GLSL specification: https://www.khronos.org/registry/gles/extensions/OES/OES_standard_derivatives.txt

Derivatives may be computationally expensive and/or numerically unstable.
Therefore, an OpenGL ES implementation may approximate the true derivatives by using a fast but not entirely accurate derivative computation.

The expected behavior of a derivative is specified using forward/backward 
differencing.

Forward differencing:

F(x+dx) - F(x)   is approximately equal to    dFdx(x).dx                  1a

dFdx(x)          is approximately equal to    F(x+dx) - F(x)              1b
                                              --------------
                                                   dx

Backward differencing:

F(x-dx) - F(x)   is approximately equal to    -dFdx(x).dx                 2a

dFdx(x)          is approximately equal to    F(x) - F(x-dx)              2b
                                              --------------
                                                   dx

With single-sample rasterization, dx <= 1.0 in equations 1b and 2b.  For
multi-sample rasterization, dx < 2.0 in equations 1b and 2b.

dFdy is approximated similarly, with y replacing x.

An OpenGL ES implementation may use the above or other methods to perform
the calculation, subject to the following conditions:

1. The method may use piecewise linear approximations.  Such linear
   approximations imply that higher order derivatives, dFdx(dFdx(x)) and
   above, are undefined.

2. The method may assume that the function evaluated is continuous.
   Therefore derivatives within the body of a non-uniform conditional are
   undefined.

3. The method may differ per fragment, subject to the constraint that the
   method may vary by window coordinates, not screen coordinates.  The
   invariance requirement described in section 3.1 of the OpenGL ES 2.0 
   specification is relaxed for derivative calculations, because the method 
   may be a function of fragment location.

Other properties that are desirable, but not required, are:

4. Functions should be evaluated within the interior of a primitive
   (interpolated, not extrapolated).

5. Functions for dFdx should be evaluated while holding y constant.
   Functions for dFdy should be evaluated while holding x constant.  
   However, mixed higher order derivatives, like dFdx(dFdy(y)) and 
   dFdy(dFdx(x)) are undefined.

6. Derivatives of constant arguments should be 0.

In some implementations, varying degrees of derivative accuracy may be
obtained by providing GL hints (section 5.6 of the OpenGL ES 2.0
specification), allowing a user to make an image quality versus speed trade
off.

This blog post confirm that a fixed function OpenGL driver will use DDX/DDY when mipmap is enabled. http://hacksoflife.blogspot.se/2011/01/derivatives-i-discontinuities-and.html

How does OpenGL know what mipmap level to use when you sample a texture in your GLSL shader with texture2D? The answer is that this:
texture2D(my_texture,uv);
actually does something like this:
texture2DGrad(my_texture,uv,dFdx(uv),dFdy(uv));
... Where Do Derivatives Come From?

The GLSL derivative functions are usually implemented by differencing - that is, the GPU takes a block of 2x2 pixels and differences the variable or expression passed to dFdx and dFdy, to calculate an 'approximate' derivative. Many GPUs rasterize 2x2 clusters of pixels at a time, with the shader instructions for the four pixels run in lock-step, so the hardware can be set up to efficiently "cross" the four texels to find our derivatives.

xranby commented 8 years ago

Both the OpenGL and OpenGL ES specification allows us to implement DDX/DDY with fast approximations

such as using forward difference or backward difference. Math to the rescue! http://mathworld.wolfram.com/ForwardDifference.html http://mathworld.wolfram.com/BackwardDifference.html

or other hacks such as the "cross" the four texels from a block of 2x2 texels

or... other known math/hacks?

jonasarrow commented 8 years ago

I wrote a simple/dirty implementation/hack for ddx/ddy support, the test cases get through, some simple tests from me, too.

Attached is a patchset, maybe someone with more experience can look through it. (Especially the constraints (reads only from r0-r5 and last write not to the write register))

dfdxy_only.txt

How it works:

Principle: Forward and backward differencing within the quad. What is needed for that: We need to know where in the quad we are, the patch uses the "load imm per-elmt unsigned" instruction for that, this may be substituted by a read from REG_A_ELEMENT_NUMBER. Then we rotate the variable forward and backward. Because we know if we need forward or backward differencing for our pixel (see previous sentence), we substract our value from one of the rotated values.

anholt commented 8 years ago

Sorry for the delay in replying -- I've been off in kernel land.

This is really cool! Thanks for playing with it. You could do something like QOP_FRAG_REVFLAG for getting at the special register, but I'd actually been working on a series recently to make regs like that be QFILE* definitions like we do for VPM reads. Let me clean that one up and push it.

Echelon9 commented 8 years ago

@anholt Saw https://github.com/anholt/mesa/tree/vc4-derivs Very keen to give this a test on my hardware and get the patches reviewed on the mailing list!

anholt commented 8 years ago

Landed:

commit 00c72acba5a98965622000d949b6835f28a9d71a Author: Eric Anholt eric@anholt.net Date: Thu Aug 25 12:32:19 2016 -0700

vc4: Add support for fddx/fddy

Based vaguely on a patch by jonasarrow on github.

Many thanks to @jonasarrow -- while I ended up mostly rewriting it into the final patch series, it's their basic idea.

anholt / mesa

vc4: No support for DDX/DDY #12

How it works: