BerkeleyLab / caffeine

A parallel runtime library for Fortran compilers
https://berkeleylab.github.io/caffeine/
Other
40 stars 7 forks source link

Fix CI #114

Closed ktras closed 4 months ago

ktras commented 4 months ago

Increment ubuntu, gfortran and g++ version in CI

ktras commented 4 months ago

@bonachea Updating the GCC_VERSION variable in the script broke the macOS CI. I am trying to fix it and found that the macOS version in the CI was very old, so I have updated that, but then there is an issue with amudprun when building gasnet. Can you check the CI output and see if you have any ideas as to why that is occurring?

bonachea commented 4 months ago

@bonachea Updating the GCC_VERSION variable in the script broke the macOS CI. I am trying to fix it and found that the macOS version in the CI was very old, so I have updated that, but then there is an issue with amudprun when building gasnet. Can you check the CI output and see if you have any ideas as to why that is occurring?

The error:

/usr/local/bin/g++-14 -O2  -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Winit-self -Wvla -Wexpansion-to-defined -Woverlength-strings -Wclobbered -Wcast-function-type -Wempty-body -Wignored-qualifiers -Wimplicit-fallthrough -Wuninitialized -Wshift-negative-value -Wno-format-overflow -Wno-format-truncation  -Wno-array-bounds -Wno-stringop-overflow -Wno-dangling-pointer -Wno-use-after-free -Wno-unused -Wunused-result -Wno-unused-parameter -Wno-address    -I. -I. -I./../amx -I./.. -D_GNU_SOURCE=1 -DAMUDP_NDEBUG=1   -DSIZEOF_CHAR=1 -DSIZEOF_SHORT=2 -DSIZEOF_INT=4 -DSIZEOF_LONG=8 -DSIZEOF_LONG_LONG=8 -DSIZEOF_VOID_P=8 -DSIZEOF_SIZE_T=8 -DSIZEOF_PTRDIFF_T=8 -DHAVE_STDINT_H -DCOMPLETE_STDINT_H -DHAVE_INTTYPES_H -DCOMPLETE_INTTYPES_H -DHAVE_SYS_TYPES_H -DAMX_ENV_PREFIX=GASNET  -I./../.. -I../.. -DHAVE_GASNET_TOOLS  -o amudprun amudprun.o -L. -lamudp -L../.. -lgasnet_tools-seq     
0  0x10191af43  __assert_rtn + 64
1  0x10181cf43  ld::AtomPlacement::findAtom(unsigned char, unsigned long long, ld::AtomPlacement::AtomLoc const*&, long long&) const + 1411
2  0x101839431  ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 19745
3  0x1018468ea  ld::InputFiles::SliceParser::parse() const + 3242
4  0x101849b71  ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_7::operator()(unsigned long, ld::FileInfo const&) const + 657
5  0x7ff80bcad066  _dispatch_client_callout2 + 8
6  0x7ff80bcbee09  _dispatch_apply_invoke + 213
7  0x7ff80bcad033  _dispatch_client_callout + 8
8  0x7ff80bcbd0f6  _dispatch_root_queue_drain + 683
9  0x7ff80bcbd768  _dispatch_worker_thread2 + 170
10  0x7ff80be4ac0f  _pthread_wqthread + 257
ld: Assertion failed: (resultIndex < sectData.atoms.size()), function findAtom, file Relocations.cpp, line 1336.
collect2: error: ld returned 1 exit status

This failure mode indicates the Apple linker has crashed, which almost by definition cannot be caused by anything in our code.

I've never seen this particular failure mode before, but googling around I'm not at all surprised to see this appears to be yet another problem caused by Apple's brittle new linker (or at least gcc's compatibility with it), which also confirms it's not our bug.

Based on that page you can possibly avoid the problem by upgrading to Xcode Command Line Tools version 15.1. Alternatively you can force use of the classic less-flaky linker by setting envvar LDFLAGS=-Wl,-ld_classic before configuring GASNet.

CC: @PHHargrove

ktras commented 4 months ago

@bonachea Thanks for the suggestions. I tried both and unfortunately neither worked. I tried different combos with different OS versions, linker options, compiler options, but no combinations worked other than using an older macOS version. Its not ideal to have to use an older os, but I am locally testing on a macOS 14.5 and that everything works as expected on my machine.

bonachea commented 4 months ago

@bonachea Thanks for the suggestions. I tried both and unfortunately neither worked.

I'm not convinced that LDFLAGS=-Wl,-ld_classic was correctly tested in isolation using the newest macOS. The only CI run I can see testing that setting failed early for an unrelated reason.

Are you certain you've fully explored that option?

ktras commented 4 months ago

@bonachea The run with ld_classic that failed is this one. There is a gfortran dynamic library that dyld can't find with this particular runner image configuration.

bonachea commented 4 months ago

@bonachea The run with ld_classic that failed is this one. There is a gfortran dynamic library that dyld can't find with this particular runner image configuration.

Thanks for the clarification, I don't know how to resolve that dyld error so I agree that might be a dead-end.

ktras commented 4 months ago

@bonachea I agree. We have issue #91 open, so we can revisit this in the future when we have time to address that issue.