Closed ficstamas closed 10 months ago
Thanks for the reporting the issue. Unfortunately, I don't have access to FX1000 nodes anymore to test it, so I will follow your proposed fix to set numHWthreads = max_id + 1
in parse_cpuinfo
. This function is only used on ARM chips.
Can you run hwloc_gather_topology
on one of the FX1000 nodes and attach the tarball here? Then I can test the topology stuff remotely in the future.
Are the "management cores" marked online in /sys/devices/system/cpu/online
?
Can you please test the linked PR whether it fixes it for you?
Ironically @ficstamas also lost access to A64FX machines :sweat_smile: so I'll try to help out:
-> % cat /sys/devices/system/cpu/online
0-1,12-59
I'm not 100% up to speed yet, but will slowly figure things out. @ficstamas already started explaining things to me.
Many thanks for stepping in and providing the output. Thanks @ficstamas for your efforts.
Then the proposed fix in PR #568 should do it.
Testing:
$ git clone -b fix_a64fx_fx1000_detection git@github.com:RRZE-HPC/likwid.git likwid-fixed
$ cd likwid-fixed
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install
$ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install install
$ export PATH=/tmp/likwid-install/bin:$PATH
$ export LD_LIBRARY_PATH=/tmp/likwid-install/lib:$LD_LIBRARY_PATH
$ likwid-topology # The last 4 NUMA domains should contain 12 HW threads each
Many thanks for stepping in and providing the output. Thanks @ficstamas for your efforts.
Then the proposed fix in PR #568 should do it.
Testing:
$ git clone -b fix_a64fx_fx1000_detection git@github.com:RRZE-HPC/likwid.git likwid-fixed $ cd likwid-fixed $ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install $ make COMPILER=GCCARMv8 PREFIX=/tmp/likwid-install install $ export PATH=/tmp/likwid-install/bin:$PATH $ export LD_LIBRARY_PATH=/tmp/likwid-install/lib:$LD_LIBRARY_PATH $ likwid-topology # The last 4 NUMA domains should contain 12 HW threads each
Oh, maybe one more difference is that I compiled the project with the Fujitsu Compiler. @vatai try the above example first, if that does not work use FCC.
It shouldn't make a difference whether GCC or FCC is used. Did it work for you with COMPILER=FCC
or were adjustments required?
It worked without issues but I never trust FCC 😄 lets just wait for @vatai
I want to merge the PR. @vatai it would be good if you could test soon
TL;DR: Merge!
Terribly sorry for the slow reply. Didn't get around github to see your tag! :(
With @ficstamas 's help, we ran the fix_a64fx branch and it looks good (see attachments).
I'm including the hwloc output as well.
AAAaaand you can't upload tgz to github so I'm sharing the files via box.com instead of attachmet (let me know if I messed up the sharing): https://riken-share.box.com/s/16fmvlpnkjwpqjsi3cc07cmc5j27ao5p
Perfekt, thank you both for finding, fixing and testing.
I downloaded the hwloc tarball from the box.com page for future testing. I know, Github does not like archives as attachments.
Describe the bug The number of PUs do not match the actual number of PUs in the last NUMA region due to the offset caused by inactive PUs. Included the actual layout at the end of the issue, but in genral the problem is that the FX1000 has 2+48 (=50) PUs where the first 2 thread is just assistant threads. The acttual working threads are indexed from 12-59 but all of the counting and indexing variables are set to the number of online PUs (50). So likwid does not list/collect anything after PU#49.
In my opinion this is the problematic code segment introduced in #447 (if I'm not wrong).
count
is set to 50 because thats number of entries in/proc/cpuinfo
(included the contents at the end of the issue)LIKWID_HWLOC_NAME(get_nbobjs_by_type)(hwloc_topology, HWLOC_OBJ_PU);
returns 2 for some reason (I'm not that familier with thehwloc
API)obj = LIKWID_HWLOC_NAME(get_obj_by_type)(hwloc_topology, HWLOC_OBJ_PU, i);
and returnos_index
of 0 and 32 respectively. Fori > 1
it returns withNULL
I made a temporary fix by hard coding
cpuid_topology.numHWThreads = 60;
after line 354 which solves the problem (seemingly). Maybe a more elegant solution can be ifparse_cpuinfo
setscount
to thehighest processor ID + 1
instead of the number of entries. Then maybe you can just delete this part and setcpuid_topology.numHWThreads = count;
To Reproduce
likwid-topology
To Reproduce with a LIKWID command Please supply the output of the command with
-V 3
added to the command:Additional context
numactl -H
hwloc-ls -v
cpuinfo
processor : 1 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 12 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 13 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 14 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 15 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 16 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 17 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 18 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 19 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 20 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 21 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 22 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 23 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 24 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 25 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 26 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 27 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 28 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 29 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 30 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 31 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 32 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 33 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 34 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 35 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 36 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 37 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 38 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 39 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 40 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 41 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 42 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 43 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 44 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 45 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 46 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 47 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 48 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 49 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 50 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 51 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 52 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 53 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 54 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 55 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 56 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 57 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 58 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0
processor : 59 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve CPU implementer : 0x46 CPU architecture: 8 CPU variant : 0x1 CPU part : 0x001 CPU revision : 0