gnudatalanguage / gdl

GDL - GNU Data Language
GNU General Public License v2.0
277 stars 61 forks source link

`byte_conversion`, `bytscl` and `formats` tests fail on Apple Silicon CI #1876

Open slayoo opened 3 months ago

slayoo commented 3 months ago

@gnudatalanguage/gdldevs, anyone having access to an Apple Silicon machine, help welcome! Thanks

Example output (https://github.com/gnudatalanguage/gdl/actions/runs/10431443505/job/28891119074)

66/214 Test  #66: test_byte_conversion.pro ...........***Failed    0.65 sec
...
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32765
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32766
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32767
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32769
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for :       -32770
...
 67/214 Test  #67: test_bytscl.pro ....................***Failed    0.69 sec
...
% TEST_BYTSCL_TOP: Error on operation : Pb with top=     -128, min/max :        0,        0
...
102/214 Test #102: test_formats.pro ...................***Failed    1.13 sec
...
% When using the RAN1 mode, be sure to keep the RAN1 and dSFMT seed arrays in separate variables.
multiple reference file <<formats.GDL>> found ! First used !!
/Users/runner/work/gdl/build/testsuite/formats.GDL
/Users/runner/work/gdl/gdl/testsuite/formats.GDL
Files to be compared : formats.IDL, formats.GDL
...
% TEST_FORMATS: =  1595 errors encountered during TEST_FORMATS tests  =
...
The following tests FAILED:
     66 - test_byte_conversion.pro (Failed)
     67 - test_bytscl.pro (Failed)
    102 - test_formats.pro (Failed)
alaingdl commented 3 months ago

OK, but too much troubles on OSX now, I need some quiet time to investigate :( (short answer : when compilating without the script, only test_formats is broken --but I experienced glitches with the tests, I don't know why-- when compilating with the script, maybe due to wxwidget, everything is broken)

GillesDuvert commented 2 months ago

see #1881

slayoo commented 2 weeks ago

as of now, we have the following failures on CI:

The test_hdf failure is:

112/214 Test #112: test_hdf5.pro ......................***Failed    0.47 sec
% Compiled module: TEST_HDF5.
% Compiled module: FILE_SEARCH_FOR_TESTSUITE.
% Compiled module: BANNER_FOR_TESTSUITE.
% Compiled module: GDL_IDL_FL.
% TEST_HDF5_BYTE_ARRAY: 
  NO errors encountered during TEST_HDF5_BYTE_ARRAY tests  
% TEST_HDF5_STRING:   NO errors encountered during TEST_HDF5_STRING tests  
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR:   3 errors encountered during TEST_HDF5_ATTR tests  
% TEST_HDF5_DATA:   NO errors encountered during TEST_HDF5_DATA tests  
% H5G_CREATE: Bad value
% TEST_HDF5_OBJ_INFO:   NO errors encountered during TEST_HDF5_OBJ_INFO tests  
% TEST_HDF5_COMP:   NO errors encountered during TEST_HDF5_COMP tests  
% H5G_CREATE: Bad value
% TEST_HDF5: =======================================================
% TEST_HDF5: =                                                     =
% TEST_HDF5: =  3.00000 errors encountered during TEST_HDF5 tests  =
% TEST_HDF5: =                                                     =
% TEST_HDF5: =======================================================
pjb7687 commented 2 weeks ago

I also noticed this we need to fix it and release v1.1.1 asap

slayoo commented 2 weeks ago

Duplicate of #1883 and could be related with: #1899?

GillesDuvert commented 2 weeks ago

@slayoo #1904 shows NO ERRORS on macos-12 (when did you see an error ?)

So on osx 14 = ARM64 = Apple Silicon = M1/M2/M3 the culprit could be that the HDF5 library has the same problem as GDL, i.e., #1881

I have no currently a M1 machine at hand to handle #1881 but it would serve nothing if the HDF5 library is rotten there...

slayoo commented 2 weeks ago

@slayoo https://github.com/gnudatalanguage/gdl/pull/1904 shows NO ERRORS on macos-12 (when did you see an error ?)

Yesterday's macos-12 build for the master branch has test_hdf5 failed: https://github.com/gnudatalanguage/gdl/actions/runs/11681438801/job/32526479047

% TEST_HDF5_BYTE_ARRAY: 
  NO errors encountered during TEST_HDF5_BYTE_ARRAY tests  
% TEST_HDF5_STRING:   NO errors encountered during TEST_HDF5_STRING tests  
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR:   3 errors encountered during TEST_HDF5_ATTR tests  
% TEST_HDF5_DATA:   NO errors encountered during TEST_HDF5_DATA tests  
% H5G_CREATE: Bad value
jkohnert commented 1 day ago

I get the test_hdf5.pro error , too on a local build in Arch. It turns out to be an error in wait() within spawn. I made a small change in basic_pro.cpp:

diff --git a/src/basic_pro.cpp b/src/basic_pro.cpp
index 87c0e005..33a5c28f 100644
--- a/src/basic_pro.cpp
+++ b/src/basic_pro.cpp
@@ -2142,7 +2142,9 @@ static DWORD launch_cmd(BOOL hide, BOOL nowait,

         // wait until child terminates
         int status;
-        pid_t wpid = wait(&status);
+        if (wait(&status) == -1) {
+          Warning(DString("SPAWN: Error waiting for child process: ") + strerror(errno));
+        }

         if (exit_statusKeyword)
           e->SetKW(exit_statusIx, new DLongGDL(status >> 8));

to track down whats happening. At least in my local build, wait() returns -1 and sets errno to ECHILD, this is what this small change reveals.

Since test_hdf5.pro uses spawn, and status does not get set to the return value of the spawned command due to the error in wait(), the test fails on my maschine.

However, I could not yet figure why I get ECHILD in the first place.

Maybe someone else has a clue, I'll try to dig into this a bit more, too.

Best Jan

BTW: In https://github.com/gnudatalanguage/gdl/pull/1913 build on MacOS 14 fails with this error, too: https://github.com/gnudatalanguage/gdl/actions/runs/11942416687/job/33289421103?pr=1913

alaingdl commented 1 day ago

As I mentioned in #1910 we do have a trouble in SPAWN for wait()

I am not sure it is the best way to just make a small patch without trying to do a good management of wait() and SPAWN ... But I am too bad in C to understand https://man7.org/linux/man-pages/man2/waitpid.2.html

I also check that this is the only occurrence of wait()

jkohnert commented 1 day ago

I tried waitpid(pid, &status, 0) while debugging, this doesn't change anything... There must be something else broken. :sob: