RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.64k stars 226 forks source link

[BUG] Events and Counters not initialized with cc-metric-collector #621

Open Joehoch2 opened 4 months ago

Joehoch2 commented 4 months ago

Dear Likwid-Team,

I have built a new version of the cc-metric-collector and tried to update likwid to 5.3.0. When I start the cc-metric-collector with the -debug option I get the following message:

cc-metric-collector[27367]: ERROR - [/root/likwid-5.3.0/src/perfmon.c:perfmon_init:2109] No such file or directory.  
cc-metric-collector[27367]: Failed to initialize event and counter lists for Intel Xeon Broadwell EN/EP/EX processor  
cc-metric-collector[27367]: ERROR 2024/04/23 13:24:57 [LikwidCollector|/root/cc-metric-collector/collectors/likwidMetric.go:781] [failed to initialize library, error -22]

The error does not occur with version 5.2.2 of likwid. So I thought maybe this is the right contact point to get help.

To Reproduce

I hope this information is enough. If not please let me know

TomTheBear commented 4 months ago

Thanks for the issue but it seems to be not in the proper location (repository). Since LIKWID seems to work (likwid-perfctr -e works), this problem is probably on the cc-metric-collector side.

The case is odd because this happens when multiple LIKWID versions are installed and the wrong one is picked at runtime but the output clearly states 5.3.0 (/root/likwid-5.3.0/src/perfmon.c:perfmon_init:2109). But for not supporting BroadwellEP, the LIKWID library has to be really ancient.

Joehoch2 commented 4 months ago

I dug deeper into this issue. At first i remembered that when you build the cc-metric-collector it downloads a older likwid-version and copies its header files. So i changed that in the makefile to download the latest version, but with no success. After that I compared src/perfmon.c of likwids version 5.3.0 and 5.2.2 and found the codeblock of the error:

    ret = perfmon_init_maps();
    if (ret < 0)    
     {               
         ERROR_PRINT(Failed to initialize event and counter lists for %s, cpuid_info.name);
         HPMfinalize();
         return ret;
     }

I changed the whole codeblock into perfmon_init_maps(); and it works like in version 5.2.2, but i think it is not the purpose, because the return code should be 0 if the function suceeded, I guess. I hope i get that right.

Nevertheless thank you for the response. Shall I copy that issue the other repository?