gnudatalanguage / gdl

GDL - GNU Data Language
GNU General Public License v2.0
270 stars 61 forks source link

test_elmhes and test_formats fail on non-x86_64 #1833

Open opoplawski opened 1 month ago

opoplawski commented 1 month ago

Working on updating the Fedora package to 1.0.5 and getting:

        Start  82: test_elmhes.pro
82: Test command: /builddir/build/BUILD/gdl-v1.0.5/build/src/gdl "-quiet" "-e" "if execute('test_elmhes') ne 1 then exit, status=1"
82: Working Directory: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite
82: Environment variables: 
82:  LC_COLLATE=C
82:  GDL_PATH=/builddir/build/BUILD/gdl-v1.0.5/testsuite/:/builddir/build/BUILD/gdl-v1.0.5/src/pro/
82:  GDL_STARTUP=
82:  IDL_STARTUP=
82: Test timeout computed to be: 3600
82: % Compiled module: TEST_ELMHES.
82: % Compiled module: ERRORS_ADD.
82: % TEST_ELMHES: Error on operation : bad result elmhes
82: % TEST_ELMHES: Error on operation : bad result elmhes,/no_balance
82: % TEST_ELMHES: Error on operation : bad result elmhes,/column
82: % Compiled module: BANNER_FOR_TESTSUITE.
82: % Compiled module: GDL_IDL_FL.
82: % TEST_ELMHES: ===================================================
82: % TEST_ELMHES: =                                                 =
82: % TEST_ELMHES: =  3 errors encountered during TEST_ELMHES tests  =
82: % TEST_ELMHES: =                                                 =
82: % TEST_ELMHES: ===================================================
 82/212 Test  #82: test_elmhes.pro ....................***Failed    0.17 sec
        Start 100: test_formats.pro
100: Test command: /builddir/build/BUILD/gdl-v1.0.5/build/src/gdl "-quiet" "-e" "if execute('test_formats') ne 1 then exit, status=1"
100: Working Directory: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite
100: Environment variables: 
100:  LC_COLLATE=C
100:  GDL_PATH=/builddir/build/BUILD/gdl-v1.0.5/testsuite/:/builddir/build/BUILD/gdl-v1.0.5/src/pro/
100:  GDL_STARTUP=
100:  IDL_STARTUP=
100: Test timeout computed to be: 3600
100: % Compiled module: TEST_FORMATS.
100: % Compiled module: GDL_IDL_FL.
100: % GDL_IDL_FL: Detected Software : GDL
100: % When using the RAN1 mode, be sure to keep the RAN1 and dSFMT seed arrays in separate variables.
100: multiple reference file <<formats.GDL>> found ! First used !!
100: /builddir/build/BUILD/gdl-v1.0.5/build/testsuite/formats.GDL
100: /builddir/build/BUILD/gdl-v1.0.5/testsuite/formats.GDL
100: Files to be compared : formats.IDL, formats.GDL
100: % Compiled module: BANNER_FOR_TESTSUITE.
100: % TEST_FORMATS: =======================================================
100: % TEST_FORMATS: =                                                     =
100: % TEST_FORMATS: =  1595 errors encountered during TEST_FORMATS tests  =
100: % TEST_FORMATS: =                                                     =
100: % TEST_FORMATS: =======================================================
100/212 Test #100: test_formats.pro ...................***Failed    0.65 sec
alaingdl commented 1 month ago

Thanks @opoplawski

Looking in the code of test_elmhes.pro, due to the way the tests are done internally, I think these 2 failures (test_elmhes & test_formats) are related to issue in formats :(

I have no way to test on my side on a recent Fedora, and I have no problem on Debian, Ubuntu & OSX !

What is the version of compiler do you have ?

thanks

opoplawski commented 1 month ago

This is with gcc 14.1.1. But it's also failing on EL9 with 11.4.1. You can check recent build logs here: https://koji.fedoraproject.org/koji/packageinfo?packageID=1830

GillesDuvert commented 1 month ago

I'm pretty sure formats won't be OK on non 64 bits machines. So some tests based on formatted string comparison won't work either. The thing is, nobody in the team knows what GDL should produce on 32 bit machines! I would suggest to avoid doing these tests on 32 bit machines, as they do not mean that GDL does not work. And wait for an user that reports a specific issue on 32 bit machine.

opoplawski commented 1 month ago

These are all 64 bit architectures - aarch64, ppc64le, s390x

GillesDuvert commented 1 month ago

These are all 64 bit architectures - aarch64, ppc64le, s390x

@opoplawski sorry but your issue refers to "non-x86_64" architectures. My above comment holds: better to remove theses tests from the list of tests in "non-x86_64" architectures building as they are meaningless.

opoplawski commented 1 month ago

I was just responding to your comment about 32-bits. But if the tests only apply to x86_64 that's fine. Although it would be nice if the tests could deselect themselves on non-x86_64. Anyway, I'm excluding them now.

GillesDuvert commented 1 month ago

thanks @opoplawski but I feel there is a misunderstanding: according to internet, s390x is a 32 bit machine when aarch64 is not. Inasmuch as I expect trouble on 32 bit machines, as we have no such machine with a working IDL at our disposal to crosscheck, there should be no problem on a 64 bit little or big endian IEEE 754 architectures. So your report of a test failure is important in this case.

opoplawski commented 1 month ago

s390x is definitely a 64 bit architecture: https://developer.fedoraproject.org/deployment/secondary_architectures/s390.html. s390 is 31/32 bit hybrid. I'll reopen then I guess. Let me know what other information would be helpful for tracking this down.

slayoo commented 1 month ago

@opoplawski, do I understand correctly that the tests pass OK on Fedora arm64 builds? In #1788, we are introducing Apple Silicon builds to CI, but the PR is blocked by two tests failing: test_byte_conversion.pro and test_bytscl.pro; if that is the case, it then seems to be an Apple compiler issue?

GillesDuvert commented 1 month ago

To go further, one needs at least to know what fails - 1595 errors on test_format: I gues every format is wrong. The test procedure creates a file "formats.GDL". @opoplawski could you send it? For AppleSilicon, I have access to an M1, just need to find the time.

alaingdl commented 1 month ago

OK, I just compiled current git version on a new M2 machine (OSX) and I have the same issues : test_elmhes.pro and test_formats.pro (I will look at test_formats later !)

On x86 processor, IDL & GDL give (first test) :

P               DOUBLE    =   -2.8958759e-07
PT              STRING    = '-00.00000029'
ST              STRING    = '101.32080078'
T               FLOAT     =       101.321
GDL> print, b
     0.500000      11.4800      5.50000      5.00000
      6.25000      30.2200      20.7500      14.5000
     0.680000      3.02080      1.28000      1.28000
     0.360000     0.500000      0.00000      0.00000

But on M2:

P               DOUBLE    =        0.0000000
PT              STRING    = '000.00000000'
ST              STRING    = '101.32079315'
T               FLOAT     =       101.321

GDL> print, b
     0.500000      11.4800      5.50000      5.00000
      6.25000      30.2200      20.7500      14.5000
     0.680000      3.02080      1.28000      1.28000
     0.360000     0.500000      0.00000      0.00000

Then from my point of view just numerical rounding and the test should be rewritten taking into account EPS

GillesDuvert commented 1 month ago

Certainly. The cumulative rounding errors make our results different between machines, and, most of all, different with IDL that does not use the same algorithms. The difficulty is to fix a safe error margin, as precisions can well drop down to 10-3 for floats.

alaingdl commented 1 month ago

I updated test_elmhes.pro in Pr #1840 with a numerical tolerance of 1e-5. For me it is close.

Concerning test_formats.pro, from what I see in the outputs, we do have a big/little indian problem ... It is a serious issue. The good news is I have now a permanent access to a M2 OSX machine (very fast feed. But Is have no time now, and I feel not competent on that. But maybe a simple flag could solve most of the problems. I hope @GillesDuvert will have time for that since he previously improved formats ...

GillesDuvert commented 1 month ago

The only differences are on unsigned 32 and bits ints and +/-NaN and +INF. I would not say it is an endianess problem.