Reduction decomposition appears to alter the program

lnarmour commented 3 months ago

Take the following two programs, which should be equivalent:

// pre.alpha:
affine energyEquations [N] -> {  : N >= 11 }
    inputs
        A : {[i]: 0 <= i < N }
    outputs
        B : {[i, j]: i >= 0 and i <= j < N }
    when {  : N >= 11 } let
        B[i,j] = reduce(+, (i,j,k,l->i,j), {: k >= i and k <= l <= j } : A[i]);
.

// post.alpha:
affine energyEquations [N] -> {  : N >= 11 }
    inputs
        A : {[i]: 0 <= i < N }
    outputs
        B : {[i, j]: i >= 0 and i <= j < N }
    when {  : N >= 11 } let
        B[i,j] = reduce(+, (i,j,k->i,j), reduce(+, (i,j,k,l->i,j,k), {: k >= i and k <= l <= j } : A[i]));
.

The second was obtained by calling ReductionDecomposition on the first.

issue

Now, using the acc script v1.1, generate new (v2) writeC code, the old makefile, wrapper and verification code with the following command:

$ acc -m -v1 pre.alpha -v2 post.alpha
[acc]: reading 'post.alpha' file
[acc]: created './energyEquations.c' file
[acc]: reading 'pre.alpha' file
[acc]: created './energyEquations.ab' file
[acc]: reading './energyEquations.ab' file
[CLooG] INFO: 1 dimensions (over 3) are scalar.
[CLooG] INFO: 1 dimensions (over 5) are scalar.
[CLooG] INFO: 1 dimensions (over 2) are scalar.
[CLooG] INFO: 1 dimensions (over 3) are scalar.
[CLooG] INFO: 1 dimensions (over 3) are scalar.
[acc]: created './energyEquations-wrapper.c' file
[acc]: created './energyEquations_verify.c' file
[acc]: created './Makefile' file
[acc]: building with make
cc energyEquations.c -o energyEquations.o -O3  -std=c99  -I/usr/include/malloc/ -lm -c
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
cc energyEquations-wrapper.c -o energyEquations energyEquations.o  -O3  -std=c99  -I/usr/include/malloc/ -lm
cc energyEquations-wrapper.c -o energyEquations.check energyEquations.o  -O3  -std=c99  -I/usr/include/malloc/ -lm -DCHECKING
cc energyEquations_verify.c -o energyEquations_verify.o -O3  -std=c99  -I/usr/include/malloc/ -lm -c
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
cc energyEquations-wrapper.c -o energyEquations.verify energyEquations.o  energyEquations_verify.o -O3  -std=c99  -I/usr/include/malloc/ -lm -DVERIFY
cc energyEquations-wrapper.c -o energyEquations.verify-rand energyEquations.o  energyEquations_verify.o -O3  -std=c99  -I/usr/include/malloc/ -lm -DVERIFY -DRANDOM

Run the verification code to compare the new writeC code (for post.alpha) with the old writeC code (for pre.alpha) and observe that there are errors:

./energyEquations.verify-rand 100
Execution time : 0.005788 sec.
TEST for B FAILED. #Errors: 4472

no issue for new and old writeC from same input

Now generate new and old writeC from the same input, either both pre.alpha or both post.alpha, run the verification code and observer no errors:

$ acc -m -v1 pre.alpha -v2 pre.alpha
… # output omitted
$ ./energyEquations.verify-rand 100
Execution time : 0.007257 sec.
TEST for B PASSED
$ 
$ acc -m -v1 post.alpha -v2 post.alpha
… # output omitted
$ ./energyEquations.verify-rand 100
Execution time : 0.006014 sec.
TEST for B PASSED

problem

I think the problem is with the ReductionDecomposition step. Apparently, the programs pre.alpha and post.alpha are not the same, when they should be. I don’t think the issue is with the code generator, since codegen for either pre.alpha or post.alpha individually compute the same result as old alphaz (unless the same bug is also present in old alphaz, which seems unlikely).

lnarmour commented 3 months ago

After looking more, this behaviour is due to float-point round off errors. The new code generator defaults the data type to 32-bit float but the epsilon used by the old wrapper, 1e-9 (it assumes doubles 64-bit floats are used), is too small. We could probably change the epsilon to be a function of the data type, but this still isn’t perfect since floating point noise is largely dependent on the particular program and the range of computed values.

lnarmour commented 3 months ago

This is indirectly addressed by PR #64.

If/when you notice that verification reports wrong answers, you should try rerunning everything with exact precision (i.e., int or long data types) and retesting with a small problem size (being careful about overflow issues). This will let you rule in/out FP round-off errors as the root cause.

rajopadhye commented 3 months ago

Indeed. Floating point arithmetic is not associative, so strictly speaking, we should not be using this data type. We should state this at he outset (and possibly provide a nasty reviewer the ammunition they need to reject our papers :-)

CSU-CS-Melange / alpha-language