Closed plusterkopp closed 1 year ago
Can you please check whether your system is telling that there are two sockets:
$ cat /sys/devices/system/cpu/cpu*/topology/physical_package_id | sort -u
A CPU socket may consist of multiple CPU dies but in your case there is only one die but (maybe wrongly) two sockets according to hwloc_init_nodeTopology:570
.
The error is likeley caused by the CPU ID 0 being on socket 1. This means, the first socket-level node in the topology tree has ID 1, not 0.
$ cat /sys/devices/system/cpu/cpu*/topology/physical_package_id | sort -u
0
1
and for completeness:
$ cat /sys/devices/system/cpu/cpu*/topology/physical_package_id
1
1
0
1
0
1
0
0
1
0
1
0
1
0
1
0
I am OK with the die, since I understand now that the list Thread 0 Core 0 Die 0 Socket 1 inCpuSet 1 represents a hierarchy with local numbering.
Yes, it represents the hierarchy but if you want to do more with LIKWID, the topology stuff should work, it's the foundation of anything in LIKWID. I will take a look.
OK, what I want: to have something similar to Windows’ GetLogicalProcessorInformationEx
for use in my https://github.com/plusterkopp/Java-Thread-Affinity lib.
Totally unrelated to this issue: Is there something lighter than the full liblikwid.so
if I only need the topology detection? I only want to access an .so
from JNA.
Since you don't seem to have a problem linking with a shared library, I would use hwloc. It's also used by LIKWID internally to get the topology information. If you want to have it more basic, you can write the stuff on your own. It's not that difficult if you leave out some features (hierarchical NUMA nodes, ...). I wrote such an interface in Golang for another project: ccTopology.
Since you don't seem to have a problem linking with a shared library, I would use hwloc.
If I install that, I can just call hwloc-ls --no-io
and parse the output. No need to mess with JNA parameter voodoo. Tha’ŧs actually a great idea. I’ll try that.
If you want to have it more basic, you can write the stuff on your own. It's not that difficult if you leave out some features (hierarchical NUMA nodes, ...). I wrote such an interface in Golang for another project: ccTopology.
In the case of hwloc
, the data structures with their use of unions are even more complex than likwid
’s. Like the Windows implementation (GetLogicalProcessorInformationEx
returns a pointer to unions, too).
In general I’d like to stay away from JNA with its need to recreate all data types on the Java side.
Ever since I had this up and running in 2015 I was looking a for a Linux equivalent where I get better topology info than what /proc/cpuinfo
provides. Didn’t think of hwloc
, much less of likwid
.
Vielen Dank für Eure Arbeit :)
I would try to attach to a library to get the information in a consistent format. Parsing the CLI output is often not future-proof. There are existing wrappers: https://github.com/UDC-GAC/jhwloc
You can add a small C-snippet in front of hwloc providing a simple interface with all the information you need to avoid unions and complicated JNA type recreation. Or you write it yourself in pure Java derived from the ccTopology code.
Thanks for the flowers :tulip:
P.S. If you have hwloc installed, please send me/attach the tarball of hwloc-gather-topology $(hostname -s)
. Then I can analyze your topology with LIKWID and fix the problems.
srvsd002.tar.zip stupid github doesn’t like .bz2 or .7z
I actually did what you suggested when I tried the Windows API. Couldn’t get it to work reliably with JNA since the returned topology entries were all of different size, so I ended up writing an extra DLL that took the Windows API and rearranged the result data.
But this time, I think I stay with the text output and update my parser if something changes.
Describe the bug This is a 2-way server from 2009 (don’t hit me). It correctly reports 2 sockets, 4 core per socket and 2 threads per core. But when listing the logical cpus, they are all on the same die and socket. Likewise, there is only Socket 1 and it has all the cpus. The L3 cache affinity however is correct (even cpus on one cache, odd cpus on the other). NUMA also checks out with 2 domains.
To Reproduce
To Reproduce with a LIKWID command Please supply the output of the command with
-V 3
added to the command:Additional context Maybe this is just a reporting problem since “hwloc_init_nodeTopology:570” seems to be on the right track (assuming a die is part of a socket). I should be able to circumvent this by focusing on L3 or NUMA affinity instead of socket, but maybe this occurs elsewhere, too.