OSGeo / PROJ

PROJ - Cartographic Projections and Coordinate Transformations Library
https://proj.org
Other
1.72k stars 774 forks source link

9.3.0-rc1 fails to build on s390x (big endian) due to test failures #3862

Closed sebastic closed 1 year ago

sebastic commented 1 year ago

Example of problem

[...]
      Start  9: 4D-API-cs2cs-style

9: Test command: /<<PKGBUILDDIR>>/obj-s390x-linux-gnu/bin/gie "/<<PKGBUILDDIR>>/test/gie/4D-API_cs2cs-style.gie"
9: Working Directory: /<<PKGBUILDDIR>>/test
9: Environment variables: 
9:  PROJ_SKIP_READ_USER_WRITABLE_DIRECTORY=YES
9:  PROJ_DATA=/<<PKGBUILDDIR>>/obj-s390x-linux-gnu/data/for_tests
9: Test timeout computed to be: 1500
9: proj_create: Error 1025 (Invalid PROJ string syntax): pipeline: Pipeline: Mismatched units between step 1 and 2
9: proj_create: Error 1025 (Invalid PROJ string syntax): pipeline: Pipeline: Mismatched units between step 2 and 3
9: proj_create: Error 1025 (Invalid PROJ string syntax): pipeline: Pipeline: proj= operator before first step not allowed
9: proj_create: Error 1025 (Invalid PROJ string syntax): pipeline: Pipeline: o_proj= operator before first step not allowed
9: proj_create: nested pipeline not supported
9: proj_create: Error 1027 (Invalid value for an argument): longlat: Invalid value for vto_meter donominator
9: proj_create: unrecognized format / unknown name
9: -------------------------------------------------------------------------------
9: Reading file '/<<PKGBUILDDIR>>/test/gie/4D-API_cs2cs-style.gie'
9:      -----
9:      FAILURE in 4D-API_cs2cs-style.gie(449):
9:      expected: 4.05 52.1 -10
9:      got:      4.050000000000   52.100000000000   39.444400787
9:      deviation:  49444.400787 mm,  expected:  0.500000 mm
9:      -----
9:      FAILURE in 4D-API_cs2cs-style.gie(458):
9:      expected: 4.05 52.1 -10
9:      got:      4.050000000000   52.100000000000   39.444400787
9:      deviation:  49444.400787 mm,  expected:  0.500000 mm
9: -------------------------------------------------------------------------------
9: total: 76 tests succeeded,  0 tests skipped,  2 tests FAILED!
9: -------------------------------------------------------------------------------
10/60 Test  #9: 4D-API-cs2cs-style ...............***Failed    0.17 sec
[...]
54: [ RUN      ] GridTest.VerticalShiftGridSet_gtx
54: ./test/unit/test_grids.cpp:90: Failure
54: Value of: grid->isNodata(-88.8888f, 1.0)
54:   Actual: false
54: Expected: true
54: 
54: [  FAILED  ] GridTest.VerticalShiftGridSet_gtx (5 ms)
[...]
      Start 57: test_defmodel

57: Test command: /<<PKGBUILDDIR>>/obj-s390x-linux-gnu/bin/test_defmodel
57: Working Directory: /<<PKGBUILDDIR>>/obj-s390x-linux-gnu/test/unit
57: Environment variables: 
57:  PROJ_SKIP_READ_USER_WRITABLE_DIRECTORY=YES
57:  PROJ_DATA=/<<PKGBUILDDIR>>/obj-s390x-linux-gnu/data/for_tests
57:  PROJ_SOURCE_DATA=/<<PKGBUILDDIR>>/data
57: Test timeout computed to be: 1500
57: [==========] Running 13 tests from 1 test suite.
57: [----------] Global test environment set-up.
57: [----------] 13 tests from defmodel
57: [ RUN      ] defmodel.basic
57: [       OK ] defmodel.basic (0 ms)
57: [ RUN      ] defmodel.full
57: [       OK ] defmodel.full (0 ms)
57: [ RUN      ] defmodel.error_cases
57: [       OK ] defmodel.error_cases (3 ms)
57: [ RUN      ] defmodel.ISO8601ToDecimalYear
57: [       OK ] defmodel.ISO8601ToDecimalYear (0 ms)
57: [ RUN      ] defmodel.evaluate_constant
57: [       OK ] defmodel.evaluate_constant (0 ms)
57: [ RUN      ] defmodel.evaluate_velocity
57: [       OK ] defmodel.evaluate_velocity (0 ms)
57: [ RUN      ] defmodel.evaluate_step
57: [       OK ] defmodel.evaluate_step (0 ms)
57: [ RUN      ] defmodel.evaluate_reverse_step
57: [       OK ] defmodel.evaluate_reverse_step (0 ms)
57: [ RUN      ] defmodel.evaluate_piecewise
57: [       OK ] defmodel.evaluate_piecewise (2 ms)
57: [ RUN      ] defmodel.evaluate_exponential
57: [       OK ] defmodel.evaluate_exponential (0 ms)
57: [ RUN      ] defmodel.evaluator_horizontal_unit_degree
57: [       OK ] defmodel.evaluator_horizontal_unit_degree (0 ms)
57: [ RUN      ] defmodel.evaluator_horizontal_unit_metre
57: ./test/unit/test_defmodel.cpp:1378: Failure
57: The difference between de and tFactor * expected_de is 2.249935682208104e-09, which exceeds 1e-10, where
57: de evaluates to 0.20000000249047095,
57: tFactor * expected_de evaluates to 0.20000000474040663, and
57: 1e-10 evaluates to 1e-10.
57: 
57: [  FAILED  ] defmodel.evaluator_horizontal_unit_metre (6 ms)
57: [ RUN      ] defmodel.evaluator_projected_crs
57: [       OK ] defmodel.evaluator_projected_crs (0 ms)
57: [----------] 13 tests from defmodel (14 ms total)
57: 
57: [----------] Global test environment tear-down
57: [==========] 13 tests from 1 test suite ran. (14 ms total)
57: [  PASSED  ] 12 tests.
57: [  FAILED  ] 1 test, listed below:
57: [  FAILED  ] defmodel.evaluator_horizontal_unit_metre
57: 
57:  1 FAILED TEST
56/60 Test #57: test_defmodel ....................***Failed    0.02 sec
[...]
95% tests passed, 3 tests failed out of 60

Total Test time (real) =  30.10 sec

The following tests FAILED:
      9 - 4D-API-cs2cs-style (Failed)
     54 - proj_test_cpp_api (Failed)
     57 - test_defmodel (Failed)
Errors while running CTest
make[2]: *** [Makefile:94: test] Error 8
[...]

Full buildlog: s390x

Environment Information

Installation method

rouault commented 1 year ago

@sebastic Those areas of the code haven't changed since PROJ 9.2.1, and I see in https://buildd.debian.org/status/fetch.php?pkg=proj&arch=s390x&ver=9.2.1-1&stamp=1686496069&raw=0 they were successful, but there was a bump of gcc version from 12 to 13.

The grid->isNodata(-88.8888f, 1.0) failure is particularly concerning on the correctness of floating-point operations on that architecture, since GTXVerticalShiftGrid::isNodata(float val, double multiplier) is

    return val * multiplier > 1000 || val * multiplier < -1000 ||
           val == -88.88880f;

I can't imagine a good reason for that to fail...

We have a s390x CI target and it runs master successfully: https://app.travis-ci.com/github/OSGeo/PROJ/jobs/608699310 , but with an older gcc version. Hence my strong suspicion it is an issue with gcc 13.

sebastic commented 1 year ago

We'll ignore the test failure on s390x for the time being.

I've contacted the s390 porters about this issue, they might be able to look into the gcc-13 regression.

Vishwanatha-HD commented 1 year ago

We (Gayathri and myself) have started looking into this issue and will have an update sooner on this.. Thanks..

Gayathri-Berli commented 11 months ago

Hi @sebastic / @rouault , We have extracted the logic as individual test case where it is failing and causing the issue. When we debug in multiple ways we found that, there is flag CMAKE_CXX_EXTENSIONS OFF in CMakelists.txt file is causing the issue. by default this flag is enabled and perform optimization. when this flag is enabled the -std is set to c++11 and Due to this the test suites which has the float variable comparison with the negative float constant and any other float value comparison are failing. when CMAKE_CXX_EXTENSIONS is on, we noticed that “-std” flag will be set to “gnu++11” i.e. “-std==gnu++11” and hence the test suites are passing without any errors…

sebastic commented 11 months ago

Do I understand correctly that the c++11 implementation in GCC is buggy?

Gayathri-Berli commented 10 months ago

Yes we have checked with our compiler team on this issue. They communicated with the gcc maintainers as well and confirmed they will enable the configure switch in next debian and ubuntu releases. when we enable CMAKE_CXX_EXTENSIONS flag ON (-std=gnu++11), haven't seen any errors. Please confirm us from your side, and if needs a work around we are ready a raise PR for that change and fix.

sebastic commented 10 months ago

We already ignore the test failure on s390x. That could be replaced with a patch to set CMAKE_CXX_EXTENSIONS, but that doesn't seem worth the effort.

Once the issue is fixed in GCC we can stop ignoring the test failure.

The gcc-13 changelog mentions:

  • Configure with --disable-s390-excess-float-precision for sid/trixie and Ubuntu noble (24.04 LTS).

Is this the change in question?

Andreas-Krebbel commented 10 months ago

Yes, with that change the problem will go away.

Actually --disable-s390-excess-float-precision is the default when configuring GCC on IBM Z. The Debian/Ubuntu compiler accidentally has been configured with --enable-s390-excess-float-precision, which was supposed to be only a temporary workaround.

For C++ compilers with excess precision enabled, there is a problem with literals due to contradicting statements in the current version of the C++ standard. This is currently being debated in the C++ standard committee: https://cplusplus.github.io/CWG/issues/2752.html https://github.com/cplusplus/papers/issues/1584

This problem makes the testcase fail when using excess precision compilers. So while it might take a bit to get an actual fix for the problem. Disabling excess precision in GCC for IBM Z is the right way to go and should be used everywhere.

The OSGeo testcase problem surfaced with GCC 13 since older GCCs did not use the excess precision settings for C++ (only for C).

sebastic commented 10 months ago

Confirmed fixed with the latest gcc-13 from Debian unstable on the s390x porterbox.