inconsistency handling associated fields

vsouvan commented 4 years ago

Not sure if it's compiler dependent or architecture... we've been having long-term problems with the test case BUFR/AMDAR+2xUS-v15.bufr and it's use of associated fields. For some platforms, we get different output. For example, on my 32-bit oneiric system, the decoder debug output contains:

Code: 001111 AFD: (0 bits) 01000001 01000001 01000001 STR: [AAA] (24 bits) Code: 001112 AFD: (0 bits) 01000010 01000010 01000010 STR: [BBB] (24 bits) Code: 204002 Code: 031021 AFD: (0 bits) 001000 IVAL=8 (0 bits) Code: 004001 AFD: (2 bits) 00 AFD: 0x0 (0 bits) 011111011001 IVAL=2009 (12 bi ts) Code: 004002 AFD: (2 bits) 00 AFD: 0x0 (0 bits) 0011 IVAL=3 (4 bits) Code: 004003 AFD: (2 bits) 00 AFD: 0x0 (0 bits) 010101 IVAL=21 (6 bits) Code: 004004 AFD: (2 bits) 00 AFD: 0x0 (0 bits) 01010 IVAL=10 (5 bits)

While the 64-bit oneiric system gives:

Code: 001111 AFD: (0 bits) 01000001 01000001 01000001 STR: [AAA] (24 bits) Code: 001112 AFD: (0 bits) 01000010 01000010 01000010 STR: [BBB] (24 bits) Code: 204002 Code: 031021 AFD: (0 bits) 001000 IVAL=8 (6 bits) Code: 004001 AFD: (2 bits) 00 AFD: 0x0 (2 bits) 011111011001 IVAL=2009 (12 bi ts) Code: 004002 AFD: (2 bits) 00 AFD: 0x0 (2 bits) 0011 IVAL=3 (4 bits) Code: 004003 AFD: (2 bits) 00 AFD: 0x0 (2 bits) 010101 IVAL=21 (6 bits) Code: 004004 AFD: (2 bits) 00 AFD: 0x0 (2 bits) 01010 IVAL=10 (5 bits)

It starts right off with differences in interpretation of the 204002 and eventually leads to different output values.

Imported from Launchpad using lp2gh.

date created: 2011-11-21T17:00:51Z
owner: chris-beauregard
assignee: chris-beauregard
the launchpad url was https://bugs.launchpad.net/bugs/893198

vsouvan commented 4 years ago

(by chris-beauregard) Part of the problem is a 32/64 bit issue. For example, take a line from bufr_dataset.c like:

sprintf( errmsg, _("IVAL=%ld "), val );

val is of type int64_t.

The problem is that depending on the architecture, either %ld or %lld is appropriate. And when you get into constructs like:

     sprintf( errmsg, _n("AFD: 0x%lx (%d bit) ", "AFD: 0x%lx (%d bits) ",  bd->value->af->nbits),
           bd->value->af->bits, bd->value->af->nbits );

where we have a mixture of uint64_t and uint16_t, things get messy.

Fixing this correctly requires using the inttypes.h header with predefined format macros. For example, to output a int64_t, you'd want to do sprintf(buf, "%d" PRId64 "\n", val);

Unfortunately, this does NOT integrate nicely with gettext() markup like _("IVAL=%ld "), which means things will need to be done using temp buffers. Mind you, much of that kind of translation markup might be inappropriate for the fields being output.

vsouvan commented 4 years ago

(by chris-beauregard) For some types, we can "fix" the problem by casting fields to types which are guaranteed to be at least as large. For example:

uint16_t val16; int64_t val64; sprintf(buf,"%u", (unsigned) val16); sprintf(buf,"%lld", (long long) val64);

vsouvan commented 4 years ago

(by chris-beauregard) I suppose it'd help to point out that -Wformat indicates where we've got a problem:

bufr_dataset.c: In function 'bufr_put_numeric_compressed': bufr_dataset.c:1056:16: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' [-Wformat] bufr_dataset.c:1056:16: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' [-Wformat] bufr_dataset.c:1075:16: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' [-Wformat] bufr_dataset.c:1075:16: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'uint64_t' [-Wformat] bufr_dataset.c:1095:13: warning: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'uint64_t' [-Wformat] bufr_dataset.c:1095:13: warning: format '%lx' expects argument of type 'long unsigned int', but argument 4 has type 'uint64_t' [-Wformat] bufr_dataset.c: In function 'bufr_put_ieeefp_compressed': ...

vsouvan commented 4 years ago

(by chris-beauregard) Fixed. Regression tests pass on all my build platforms: 32 bit etch, lenny, squeeze, lucid, maverick, natty, oneiric, and 64bit oneiric.

ECCC-MSC / libecbufr

inconsistency handling associated fields #45