vc4: Inaccurate exp2/log2

anholt commented 8 years ago

Our math unit gives you fast, inaccurate math (piglit tests fs-log2-float, fs-exp2-float). To meet GL requirements, we need to do some Newton steps for improving the accuracy of our answers. I've done this for RCP and RSQ but not LOG2 and EXP2.

Echelon9 commented 8 years ago

I've spent some time reading the literature and testing some implementations to improve the approximation of LOG2 / EXP2. No patches yet, but thought helpful for others passing through these areas to have a record of my notes.

I am not a number theory master. However, it appears Newton-Raphson method is not viable to improve an inaccurate log2() or exp2() approximation received from the vc4 hardware.

Newton-Raphson method is a root finding method.
It relies upon the first derivative of the root we are trying to find.
For log2() roots, the first derivative includes exp2() (and vice versa).
i.e. An implementation with error prone log2() and exp2() continues to accumulate errors, rather than converging for each iteration. Regardless of the iterations run.
This issue is unique to the transcendental functions as compared to RCP and RSQ.

For an example of such a (non-working) implementation see: https://github.com/Echelon9/mesa/compare/fix/log2-newton-raphson-v1

I've started a review of the academic literature to see how this problem of approximating transcendental functions may have been solved with graphics hardware. The goal is to discover if any learnings can help here with the vc4's fact, inaccurate SFU math unit.

The range of classes to approach the problem are typically described as:

Iterative:
- Digit recurrence
- Functional iteration (e.g. Newton-Raphson or Goldschmidt's algorithm)
Non-iterative:
- Direct table lookup
- Polynomial
- Rational approximations
- Table-based methods

Importantly, functional iteration methods are orientated to division and square root computations only and do not allow computation of logarithm, exponential or trigonometric functions.

Source: S. Oberman and M. Siu (2005); J.-A. Pineiro and M.D. Ercegovac (2003, 2004, 2005) and M.D. Ercegovac (1973)

One pre-optimisation step that may assist our fast, inaccurate vc4 math problem is range reduction. This step is often carried out as the initial step when approximating a function (the second is the actual functional approximation, and the final step is reconstruction).

The approximation is performed in an input interval [a, b) and range reduction is used for values outside this interval. For the elementary functions there are natural selections of the intervals [a, b) which simplify both the steps of range reduction and functional approximation (Jose-Alejandro Pineiro et al, 2005).

If a single-precision floating point number is represented in terms of a mantissa and an exponent:

Mx = X * 2 ^ Ex
# where Ex is an integer and X is between 1 and 2.

Then for logarithm the range reduction is:

# log2(Mx) = log2(X) + Ex * log2(2)
# The latter term, Ex * log2(2), is just a multiplication by a known constant.

log2(Mx) = log2(X) + Ex, for X in (1, 2) if Ex =/= 0, or
log2(Mx) = (X - 1) [log2(X)/(X - 1)], for X in (1, 2) if Ex = 0

Range reduction will assist us if the errror term of the vc4 hardware implemented log2() and exp2() approximations are not monotonic i.e. sufficiently accurate between (1, 2).

jonasarrow commented 8 years ago

The hardware is precise to the first some digits. The range reduction does not help. (If you look at the test cases, they already operate on a limited range.)

The problem is, that there is no precise model of the hardware computation model present yet. Because then you could for example cut the last digits and perform some taylor steps.

It is possible to implement the functions precise enough with taylor expansion and range reduction, but the performance Penalty is so high, that any non-trivial shader is so slow and so complex, it might not even compile. (You need around 6th order to get to the desired precision).

A possible solution for a software only solution would be reimplementing the functions of the libc.

I think, the fastest approach would be leveraging the SFU function and then doing some magic trickery afterwards.

EXP: If you know, how much the SFU at maximum goes above and below of the real result, you could multiply the source by a factor which compensates the difference to the lower, then you cut the result at the last known precise digit and compute the function again. (Or something like that)

exp(a) = exp(floor(a))*exp(ffract(a)) = e^-3*exp(a*3) = e^-3*exp(floor(a*3))*e^-3*exp(ffract(a*3))

LOG:

log(a) = 1/n * log(a^n)

Personally, I wouldn't fix this as long as no software truly relies on the precision of these functions. On a platform, which is so low-power and low-performance a 10 to 100 fold hit for using a slightly more precise function seems to be a trade off which most of the developer aren't willing to take.

anholt / mesa

vc4: Inaccurate exp2/log2 #8