gnudatalanguage / gdl

GDL - GNU Data Language
GNU General Public License v2.0
274 stars 61 forks source link

OSX: performance issues within Clang + duplicate symboles with g++-13 #1755

Closed alaingdl closed 6 months ago

alaingdl commented 6 months ago

OK, the performance issues within FOR loops detected first on Mac M2/M3 is in fact also here on x86_64

The OSX versions here were compiled with the script, and OpenMP is declared as ON (all tests : 4, 5, 16, 25 are bad, but also 2 regress since clang 17 :(

Unfortunately I cannot finish the compilation with GCC 13 because of duplicates symbols

CC=/usr/local/bin/gcc-13 CXX=/usr/local/bin/g++-13 cmake .. -DREADLINE=no -DHDF=OFF -DHDF5=OFF -DPYTHON=off -DGRAPHICSMAGICK=off -DMAGICK=OFF -DWXWIDGETS=off -DQHULL=off

[...]  // the first ones 

[ 15%] Linking CXX executable gdl
duplicate symbol '__ZTS5Data_I10SpDComplexE' in:
    CMakeFiles/gdl.dir/datatypes.cpp.o
    CMakeFiles/gdl.dir/basic_op.cpp.o
duplicate symbol '__ZTI5Data_I10SpDComplexE' in:
    CMakeFiles/gdl.dir/datatypes.cpp.o
    CMakeFiles/gdl.dir/basic_op.cpp.o

[...] // the last ones

duplicate symbol '__ZTS5Data_I9SpDLong64E' in:
    CMakeFiles/gdl.dir/datatypes.cpp.o
    CMakeFiles/gdl.dir/ofmt.cpp.o
duplicate symbol '__ZTI5Data_I9SpDLong64E' in:
    CMakeFiles/gdl.dir/datatypes.cpp.o
    CMakeFiles/gdl.dir/ofmt.cpp.o
ld: 252 duplicate symbols for architecture x86_64
collect2: error: ld returned 1 exit status

datatypes.cpp.o is always involved ...

GillesDuvert commented 6 months ago

I confirm that -fsanitize=address makes gdl 100 times faster for code related to memory transfer (copy from variable to variable) on a Mac mini with M1. The code to be tested is simple:

GDL> tic & for i=1L,600000 do a=1 & toc
% Time elapsed : 4.5299740 seconds.

which takes 0.057317972 seconds on my intel linux laptop, gcc compiler, no eigen:: As

GDL> tic & for i=1L,600000 do a=a & toc
% Time elapsed : 0.019397974 seconds.

is internally optimised to do nothing (a=a !!!), 0.019397974 seconds measures the empty loop speed, which is OK.

This restricts the area of the problem to a very tiny number of code lines, essentially what happens in "a=1".

GillesDuvert commented 6 months ago

@alaingdl the multiply defined symbol have already been encountered ( #677 , #734) , and should indeed be avoided. However there always were compiler options to circumvent that problem which arises only on a limited number of platforms.

GillesDuvert commented 6 months ago

CULPRIT FOUND!!!

On OSX, for obscure historical reasons, and given that the system defines HAVE_MALLOC_ZONE_STATISTICS and HAVE_MALLOC_MALLOC_H, the very very inner code for destruction of variables would call the obscure UpdateCurrent() function to report precise memory useage. The loss of time is tremendous, and would have been seen in a profiler by the enormous number of calls to strange functions like malloc_zone_statistics() etc.

making UpdateCurrent() just return solves the speed problem, time_test4 drops to 1 sec.

GillesDuvert commented 6 months ago

Just commited the single-liner that is supposed to do wonders.

alaingdl commented 6 months ago

@GillesDuvert : brilliant ! Thanks

tested on a intel OSX, using the script ...

GDL> time_test4
[...]
      1.10098=Total Time,      0.021701576=Geometric mean,      25 tests.

GDL> TEST_LOOPS
% Time elapsed : 0.0098431110 seconds.
% Time elapsed : 0.010197878 seconds.
% Time elapsed : 0.0053970814 seconds.
% Time elapsed : 0.0092120171 seconds.
brandy125 commented 6 months ago

Congrats!!!!

On 2. Mar 2024, at 15:09, Giloo @.***> wrote:

CULPRIT FOUND!!!

On OSX, for obscure historical reasons, and given that the system defines HAVE_MALLOC_ZONE_STATISTICS and HAVE_MALLOC_MALLOC_H, the very very inner code for destruction of variables would call the obscure UpdateCurrent() function to report precise memory useage. The loss of time is tremendous, and would have been seen in a profiler by the enormous number of calls to strange functions like malloc_zone_statistics() etc.

making UpdateCurrent() just return solves the speed problem, time_test4 drops to 1 sec.

— Reply to this email directly, view it on GitHub https://github.com/gnudatalanguage/gdl/issues/1755#issuecomment-1974868089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOC5K6HM546IOUDCHT5XCO3YWIIXDAVCNFSM6AAAAABDU7VVQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUHA3DQMBYHE. You are receiving this because you are subscribed to this thread.

GillesDuvert commented 6 months ago

1776