Valgrind error with decimal_arith

jclark commented 2 years ago

Running valgrind on decimal_arith gives:

==1482255== Conditional jump or move depends on uninitialised value(s)
==1482255==    at 0x409179: decQuadFMA (decBasic.c:2080)
==1482255==    by 0x408C08: decDivide (decBasic.c:658)
==1482255==    by 0x402CE1: _bal_decimal_rem (decimal.c:66)
==1482255==    by 0x40200E: genAndValidateDecRem (decimal_arith.c:153)
==1482255==    by 0x4020BC: testRem (decimal_arith.c:159)
==1482255==    by 0x4029F3: main (decimal_arith.c:243)
==1482255==  Uninitialised value was created by a stack allocation
==1482255==    at 0x408E55: decQuadFMA (decBasic.c:1998)
==1482255== 
==1482255== Conditional jump or move depends on uninitialised value(s)
==1482255==    at 0x4095BE: decQuadFMA (decBasic.c:2245)
==1482255==    by 0x408C08: decDivide (decBasic.c:658)
==1482255==    by 0x402CE1: _bal_decimal_rem (decimal.c:66)
==1482255==    by 0x40200E: genAndValidateDecRem (decimal_arith.c:153)
==1482255==    by 0x40221F: testRem (decimal_arith.c:175)
==1482255==    by 0x4029F3: main (decimal_arith.c:243)
==1482255==  Uninitialised value was created by a stack allocation
==1482255==    at 0x408E50: decQuadFMA (decBasic.c:1998)

I built runtime using CFLAGS="-g -O2" make in order to get line numbers.

jclark commented 2 years ago

I can't see anything in our code that could be triggering this, but decNumber is reputedly very solid.

@manuranga Can you see anything?

manuranga commented 2 years ago

Error went away when I forced all stack variables to initialized to zero in decQuad.o using clang flag -ftrivial-auto-var-init=zero. I am still looking into which exact stack variable is responsible.

manuranga commented 2 years ago

Following change got rid of the error, but it's still not clear why.

index 1f680c4a..b306f68d 100644
--- a/runtime/third-party/decNumber/decBasic.c
+++ b/runtime/third-party/decNumber/decBasic.c
@@ -2001,7 +2001,7 @@ decFloat * decFloatFMA(decFloat *result, cons
   // one byte to the left in case of carry, plus DECPMAX+2 to the
   // right for the final addition (up to full fhs + round & sticky
   #define FMALEN (ROUNDUP4(1+ (DECPMAX9*18+1) +DECPMAX+2))
-  uByte  acc[FMALEN];              // for multiplied coefficient i
+  uByte  acc[FMALEN] = { 0 };      // for multiplied coefficient i
                                    // .. and for final result
   bcdnum mul;                      // for multiplication result
   bcdnum fin;                      // for final operand, expanded

jclark commented 2 years ago

I also tried compiling decNumber with gcc to see whether this was a LLVM issue, and I get the same error. So I think we can conclude it is decNumber problem.

I suspect this harmless. The code is processing BCD digits 4 at a time (using a 32-bit int). My guess is that it is doing a transformation on some number of digits that is not a multiple of 4, which is leading to junk bytes being transformed into junk bytes.

In the absence of any evidence that this is a problem, I don't think we should spend any more time on this now. At some point, we should produce a small, self-contained example and report it upstream.

ballerina-platform / nballerina

Valgrind error with decimal_arith #731