edwar-vhd / SFU-Piecewise-Polynomial-Approximation

Special Function Units (SFUs) are hardware accelerators, their implementation helps improve the performance of GPUs to process some of the most complex operations. This SFU implements the Piecewise Polynomial Approximation, which provides high performance, low area costs and good accuracy for real implementation of hardware.
6 stars 1 forks source link

is this right? Analyzing the format of coefficient LUT c0, c1, c2. X2 = [. xm+1 ... xn ] * 2^-m. #3

Open dalxung opened 1 year ago

dalxung commented 1 year ago
  1. Software used to generate coefficients ( maplesoft's maple software ) In SFU, each coefficient LUT ( c0, c1, c2 ), which is a coefficient array, is created with this bit range ( m = 6 ) , so the number of array elements is 64 ( 2^6 ). The data type sizes (bit array size) of c0, c1, and c2 to be stored are t, p, and q, respectively.

    Quadratic polynomial c0, c1, c2 coefficient values sin(x), cos(x), rsqrt(x), log2(x), exp2(x), 1/x and sqrt(x)

                                c0              c1              c2
        1/x                    +0.1xxxx...xx,  -0.xxxxx...xx,  +0.xxxxx...xx
        sqrt(x)                +1.0xxxx...xx,  +0.01xxx...xx,  -0.000xx...xx
        rsqrt(x)               +0.1xxxx...xx,  -0.0xxxx...xx,  +0.0xxxx...xx
        exp2(x)                +1.xxxxx...xx,  +x.xxxxx...xx,  +0.0xxxx..xx
        log2(x)                +0.xxxxx...xx,  +x.xxxxx...xx,  -0.xxxxx...xx
        sin(x), cos(x)         +0.xxxxx...xx,  +x.xxxxx...xx,  -0.0xxxx...xx
is it?

Analyzing the format of coefficient LUT c0, c1, c2. For example, sin(x) is +0.xxxxx pattern, if C0 is the format If the first bit string is '+0'. Just connect them by adding +0 in front of the .xxxxx bit output stored in the LUT array. No need to store +0. You only need to save the xxxxx format.

  1. A second-order approximation polynomial for a transcendental function has the form f(x) = C0(XH) + C1(XH)XL + C2(XH)XL^2 The size of the 32-bit floating-point realm is n bits, and the input argument x to function f is: upper part of m-bit XH and Consists of (divided) into lower XL of (n-m) bits Generate coefficients C0, C1, C2 using fractional field XH (use XH as select index into coefficient array C0, C1, C2 LUT when transcendental function is called)

When performing C2X2^2 + C1X2 + C0 ( C2(XH)XL^2 + C1(XH)XL + C0(XH) ) operation When m=7 X1 = XH, X2 = XL X1 = [. x1 x2 ... xm ] are used as LUT indices. X2 = [. xm+1 ... xn ] 2^-m. X2 = XL(number value set as bitstring) 2^-7

  1. Classification by type of coefficient LUT size There are two types of coefficient LUT sizes: 2^6=64 and 2^7=128. LUT[64] being ln2e0, exp, cos, sin, sqrt LUT[128] being In2, reci_sqrt_2_4, reci_sqrt_1_2, reci Is the in2 function the coefficient of log2? When are ln2e0 and In2 used for each other? When are reci_sqrt_2_4 and reci_sqrt_1_2 used for each other?
divadnauj-GB commented 1 month ago

Hi sorry for the answer delay, I've been busy working on other projects.

First, we are not storing the constant parts of the coefficients, only the .xxxxx values. On the other hand, there might be misleading file names regarding the log2 function (use log2, instead of in2). You can find more details in the following reference paper regarding the use of different intervales (1,2) or (2,4) for reciprocal and rsqrt functions. J. . -A. Pineiro, S. F. Oberman, J. . -M. Muller and J. D. Bruguera, "High-speed function approximation using a minimax quadratic interpolator," in IEEE Transactions on Computers, vol. 54, no. 3, pp. 304-318, March 2005, doi: 10.1109/TC.2005.52.