Open proywm opened 7 years ago
Hello,
Thank you for reporting the issue. I just pushed a commit that should fix the problem. In brief, the latencies of hardware contexts were incorrectly clustered together with latency 0, thus not properly creating the separate group between a single context and the two hw contexts that are together in a core.
Please let me know if now mctop
works properly on this machine.
Vasilis.
By the way, there is huge variance across sockets:
#2 : size 162 / range 456 - 972 / median: 690
and even the minimum 1-hop latency is 456 cycles, which is much higher than what I have ever seen.
I would be very interesting to see the output of mctop
(if successful and you would like to share) :-)
Thanks for quick reply. Now its getting Segmentation fault. Sometimes it is causing "Floating point exception".
##############################################################
##########################################################################
############################################################## Segmentation fault
From what I see, it crashed on "Calculating cache latencies / sizes," which is not an essential part of the topology creation. I disabled this step for now (through a commit) and I will try to investigate when I have the time.
I hope it will work this time :-)
Thanks for your help.
Unfortunately its still getting segfault. I have attached the generated mct file.
########################################################################## Segmentation fault
Thanks, Probir
I really don't like the numbers: They shouldn't be that high in a four socket processor. (If you replace 908 with 600 in the mct file, loading the topology works.)
Can you send me the output of: ./mctop -m0 -r5000 -f2 -v
?
Thanks, it seems that there are a couple of measurements that are off:
Even if they were not off, the current implementation of mctop
would not work on such an assymetric topology.
If you are not bored, one more test that you can run is with the manually fixed topology that I described before. In the server.mct
file that you shared earlier, replace 908 with 600 and leave the file in the desc
folder. Then, you can execute ./mctop -a
to get memory latencies and bandwidths. These measurements will show to us whether the assymetry that we see truly exists.
Thanks again and sorry for all these complications!
One of the ideas that I want to implement at some point (so far I didn't have the need) is to have a backup plan if the topology creation fails: Read the topology from the OS and augment it with measurements.
Vasilis.
Indeed, they are symmetric, with one weaker link each -- very interesting topology :)
My best "guess" is that the problem is due to:
For now, I have made the DVFS handling more aggressive. You could try:
./mctop -f2 -c30 -r5000 -d3
I have some more proper solutions for multi-cores such as this one, but I need to find the time to implement them.
Thanks, once again.
I forgot to mention the -i
option of mctop
.
./mctop -f2 -c30 -r5000 -d3 -i5
will explicitely try to find a clustering with 5 latency clusters...
Hi Probir,
The "good" news is that mctop
did a very reasonable clustering. The bad news is that there are some outlier values that cannot be clustered together. I wrote two scripts to help us with debugging the problem:
You can invoke:
./scripts/ccbench.sh -x13 -y16
and then ./scripts/ccbench.sh -x14 -y16
./scripts/ccbench.sh -x13 -y50
and then ./scripts/ccbench.sh -x13 -y51
and then ./scripts/ccbench.sh -x14 -y50
./scripts/ccbench_map.sh
Essentially, we are measuring some problematic latencies manually, to figure out if it's mctop
's problem.
out of scripts:
Well, it is not :-( Look at the latencies with Node 0:
0 <-> 1 1 <-> 0
526 376 382 536
0 <-> 2 2 <-> 0
611 503 485 583
0 <-> 3 3 <-> 0
492 335 378 547
It is faster for other nodes to receive data from Node 0 than for Node 0 to access other nodes.
Other than that:
0 <--> 0
105.9
107.3
0 <--> 1
526.6
376.2
0 <--> 2
611.6
503.9
0 <--> 3
492.4
335.2
1 <--> 0
382.2
536.0
1 <--> 1
120.6
121.6
1 <--> 2
843.4
823.6
1 <--> 3
781.9
728.2
2 <--> 0
485.6
583.9
2 <--> 1
824.8
843.0
2 <--> 2
104.0
106.8
2 <--> 3
786.1
771.8
3 <--> 0
378.8
547.9
3 <--> 1
766.2
812.3
3 <--> 2
777.0
785.3
3 <--> 3
97.7
the other nodes are quite reasonably connect.
Bottom line is that the current implementation of mctop
does not support this type of asymmetry.
mctop run is getting aborted. lstopo and cpuinfo output has been attached.
MCTOP Settings:
Machine name : server
Output : MCT description file
Repetitions : 2000
Do-memory : Latency+Bandwidth on topology
Mem. size bw : 512 MB
Cluster-offset : 20
Max std dev : 7
Cores : 64
Sockets : 4
Hint : 0 clusters
CPU DVFS : 1 (Freq. up in 160 ms)
Progress : 100.0% completed in 49.3 secs (step took 0.2 secs)
CDF Clusters
0 : size 3 / range 0 - 20 / median: 16
1 : size 16 / range 72 - 108 / median: 94
2 : size 162 / range 456 - 972 / median: 690
##############################################################
CPU is SMT: 1
Lat table
MCTOP output in: ./desc/server.mct
########################################################################## mctop: src/mctop_topology.c:851: mctop_fix_n_hwcs_per_core_smt: Assertion `gs->type == CORE' failed. Aborted
OS configuration: Linux server 3.10.0-229.4.2.el7.x86_64 #1 SMP
cpuinfo.txt lstopo.txt