Open jonas-schulze opened 10 months ago
Follow-up question: are the .init_fileTopology
field and the mentioned static void initTopologyFile(FILE* file);
function dead code?
I'm sorry that you had such a bad first experience. I made the choice to only support LIKWID >= 5.2 but didn't document it properly. My bad.
- Document the version requirements for liblikwid
- Let LIKWID.jl fail gracefully, if it recognizes an unsupported version of liblikwid
Agreed 👍
- Let LIKWID.jl bundle its own liblikwid
That'd be great but it's difficult because LIKWID itself isn't portable yet. See https://github.com/JuliaPackaging/Yggdrasil/pull/4913 and the links over there.
Follow-up question: are the
.init_fileTopology
field and the mentionedstatic void initTopologyFile(FILE* file);
function dead code?
Hi, this is not dead code. LIKWID provides likwid-getTopoCfg
to create a topology file than is then read in with initTopologyFile()
. Extremely helpful on systems where the topology lookup takes minutes like HPE/SGI UltraViolet (>1000 CPU sockets).
For the CUPTI stuff in LIKWID, I use versioned structs based on the CUDA/CUPTI version.
It can be expected that with every x.y.0 release there will be changes in the API, so with 5.2.0 or 5.3.0 new struct members and functions were added.
I just saw your talk from JuliaCon 2023 that also mentioned LIKWID.jl and I was eager to try it out. :rocket:
Unfortunately, trying the very first tutorial, I got a segfault. After some digging, I found the culprit: the definitions of
CpuTopology
between LIKWID.jl and the underlying liblikwid differed. Mylikwid.h
version 5.1 (installed viaapt
) defines theCpuTopology
struct without thenumDies
field, while LIKWID.jl version 0.4.4 defines itsCpuTopology
struct havingnumDies
in position 4. Consequently, the fields are shifted when transferring the data from C to Julia: Julia'snumDies
contains C'snumCoresPerSocket
, ...,numCacheLevels
contains the value ofthreadPool
, which is a pointer. WrappingcacheLevels
(C'stopologyTree
) into an array of lengthnumCacheLevels
(C'sthreadPool
) leads to the malformed memory accesses. :boom:likwid.h: https://github.com/RRZE-HPC/likwid/blob/v5.1/src/includes/likwid.h#L370-L380
Liblikwid.jl: https://github.com/JuliaPerf/LIKWID.jl/blob/v0.4.4/src/LibLikwid.jl#L640-L651
The
numDies
field has been added in https://github.com/RRZE-HPC/likwid/commit/a0ac14d2619222665df6afa7491a5958ff34f03e, which according to GitHub shipped in likwid version 5.2 and onwards. Moving thenumDies
field to the end of the C struct is not an option, I guess, and removingnumDies
from LIKWID.jl as its not used anyways. Therefore, I would recommend: