adesutherland / CREXX

REXX Language implementation
Other
13 stars 3 forks source link

feature/f0049 stopped building successfully under Ubuntu Linux #380

Open rvjansen opened 1 week ago

rvjansen commented 1 week ago

macOS builds OK. Linux mutters about lto and parallel; on emulated Ubuntu ARM32 this can be solved with `ninja -j 1'; although linkage editing takes an enormous amount of time.

On real iron (i86_64) Ubuntu the linkage editor tells us:

FAILED: decimal/rxdec_mc_decimal_dyn.decplugin
: && /usr/bin/cc -fPIC -O3 -DNDEBUG   -shared  -o decimal/rxdec_mc_decimal_dyn.decplugin decimal/CMakeFiles/mc_decimal_dyn.dir/mc_decimal.c.o  decnumber/libdecnumber.a && :
/usr/bin/ld: decnumber/libdecnumber.a(decNumber.c.o): warning: relocation against `DECPOWERS' in read-only section `.text'
/usr/bin/ld: decnumber/libdecnumber.a(decNumber.c.o): relocation R_X86_64_PC32 against symbol `DECPOWERS' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
adesutherland commented 1 week ago

I did a push yesterday. Have you got that? Did it fix it?

rvjansen commented 1 week ago

I've got it. It fixes the breakage but linkedits are still off-scale

adesutherland commented 1 week ago

I also saw linking being slow. I found this is caused by the link time optimisations being applied to all versions of the vm machine in RELEASE, and I suspect this is caused by linking in the decimal library. Well - I am changing how decimal works so let's keep the issue open until then.

In the meantime, can you try giving the ubuntu much more RAM to see if that helps? It should (maybe) - I read that the LTO is very memory intensive - basically it is trying to inline functions across libraries, so lots to keep track of.

Worse comes to worse we will turn off LTO after some speed tests - but I think we will be able to sort it

rvjansen commented 1 week ago

The machine with the IA86_64 Ubuntu has 16GB - that should be sufficient; high watermark is 3.53G and there is no swap. It does however peg one cpu at all times during linkedit.

Screenshot 2024-10-09 at 15 45 21

This perf top puts the blame at lto in a very clear way.

adesutherland commented 1 week ago

As you say CPU bound. I suspect I will end up turning it off and making sure that key inner loop is one file. I imagined it was costless!

Search for INTERPROCEDURAL_OPTIMIZATION... (there are a few ending in _RELEASE etc) in the cmake file in the interpreter. You can turn it off, although it will be interesting to know if it makes any difference to the performance of crexx

rvjansen commented 1 week ago

I've got some surprising results. On my physical Linux box, INTERPROCEDURAL_OPTIMIZATION makes:

My vote would be to temporarily remove lto until we understand what is going on.

rvjansen commented 1 week ago

most time is spent in a method called [.] get_continuation_for_phi; I have tried -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=False (and FALSE for that matter) but the only thing that really helps is commenting out the set_property(TARGET rxvme PROPERTY INTERPROCEDURAL_OPTIMIZATION_RELEASE TRUE) lines in CMakeLists.txt. Those need to be done for all because they seem to have a contagious property.