RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.64k stars 226 forks source link

symbol lookup error with likwid-appDaemon.so #587

Open wht0703 opened 8 months ago

wht0703 commented 8 months ago

Hi, I recently encountered the following error when I tried to use the timeline mode of likwid-perfctr to collect power usage data for rodinia gpu benchmarks on nvidia GPU:

/var/tmp/likwid/bin/likwid-perfctr -G 0 -W POWER -t 100ms "./run"                                                                                                                                                                                                                 
--------------------------------------------------------------------------------
CPU name:   Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
CPU type:   Intel Icelake SP processor
CPU clock:  2.39 GHz
--------------------------------------------------------------------------------
Old LD_PRELOAD=likwid-appDaemon.so
/bin/sh: symbol lookup error: /var/tmp/likwid/lib/likwid-appDaemon.so: undefined symbol: bfromcstr
--------------------------------------------------------------------------------

At first, I thought that the likwid-appDaemon.so failed to include something at runtime. However, when I checked the linked libraries of likwid-appDaemon.so, I received the following output:

ldd /var/tmp/likwid/lib/likwid-appDaemon.so                                                                                                                                                                                                                                        
    linux-vdso.so.1 (0x00007fff6d44c000)
    liblikwid-gotcha.so.5.3 => /var/tmp/likwid/lib/liblikwid-gotcha.so.5.3 (0x00007f1a73fd6000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f1a739ca000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f1a7367f000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f1a7347b000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1a73258000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1a73dbf000)

It seems that it does have located all the required libraries for linking. Do you have any ideas about what could be the potential reason causing this problem?

TomTheBear commented 8 months ago

You are on the right track. All libs were found but it misses one lib: liblikwid.so. You could try to set LD_PRELOAD=liblikwid.so before starting likwid-perfctr. I check the Makefiles why likwid-appDaemon.so is not linked to liblikwid.so

wht0703 commented 8 months ago

Hi, Thank you for your response. However, the issue still persists even when I manually set LD_PRELOAD=liblikwid.so.

/var/tmp/likwid/bin/likwid-perfctr -G 0 -W POWER -t 100ms ./run                                                                                         ─╯
--------------------------------------------------------------------------------
CPU name:   Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
CPU type:   Intel Icelake SP processor
CPU clock:  2.39 GHz
--------------------------------------------------------------------------------
Old LD_PRELOAD=likwid-appDaemon.so:liblikwid.so
/bin/sh: symbol lookup error: /var/tmp/likwid/lib/likwid-appDaemon.so: undefined symbol: bfromcstr
--------------------------------------------------------------------------------
TomTheBear commented 8 months ago

OK, so we have to rebuild to fix it. In src/access-daemon/Makefile, change the CPPFLAGS:

-CPPFLAGS :=  $(DEFINES) $(INCLUDES) -L$(PREFIX)/lib
+CPPFLAGS :=  $(DEFINES) $(INCLUDES) -L../..

And rebuild (make distclean && make). It should now be linked with liblikwid.so.

TomTheBear commented 8 months ago

This should now be fixed in the master branch. It would be great if you could test it and comment on/close the issue.

wht0703 commented 8 months ago

Hi, I cloned the updated repo and built Likwid again. However, the problem still persists. I'm wondering whether this issue has something to do with the build options: direct or perf-event. Since I don't have sudo rights on my testbed system, I can't work with AccessDaemon.

TomTheBear commented 8 months ago

I will try with the other modes

mhorst00 commented 3 months ago

I have the same issue while trying to run timeline mode on AMD GPUs. I'm on commit #69971d, which should include your previous fix for the missing liblikwid.so. My ldd looks like this:

ldd /var/tmp/likwid-lua/lib/likwid-appDaemon.so 
        linux-vdso.so.1 (0x00007ffd9acdb000)
        liblikwid-gotcha.so.5.3 => /var/tmp/likwid-lua/lib/liblikwid-gotcha.so.5.3 (0x00007fb34442c000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fb343e4e000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fb343acc000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fb3438c8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb3436a8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb344213000)

I compiled with ROCM 5.7.2 and use the accessdameon and appdaemon. Running with a prefix manual LD_PRELOAD did not work either.