OpenXiangShan / XiangShan

Open-source high-performance RISC-V processor
Other
4.45k stars 618 forks source link

CI: enable PGO when building emu for CI #3080

Closed cyyself closed 2 weeks ago

cyyself commented 2 weeks ago

This PR adds PGO when building emu for CI. The workload used for training is coremark-2-iteration. To avoid CPU bugs causing PGO workload fails, this PR uses --no-diff for PGO.

Preliminary Result is shown below:

Item Run 7059(s) Run 7051(s) Speed Up %
Set up job 2 2  
Run actions/checkout@v2 41 40  
set env 31 31  
clean up 31 32  
Build EMU 5193 3568 -31.29
Basic Test - cputest 179 255 42.46
Basic Test - riscv-tests 413 649 57.14
Basic Test - misc-tests 1052 1916 82.13
Basic Test - nodiff-tests 38 42 10.53
Simple Test - microbench 141 188 33.33
Simple Test - CoreMark 492 897 82.32
System Test - Linux 2180 3793 73.99
Floating-point Test - povray 785 1402 78.60
Uncache Fetch Test - copy and run 703 555 -21.05
Post Run actions/checkout@v2 33 32  
Complete job 1 0  
All - EMU - Basic 11315 13402 18.44
Set up job 3 2  
Run actions/checkout@v2 44 44  
set env 31 31  
clean up 31 32  
Build EMU 4707 3536 -24.88
SPEC06 Test - mcf 3076 6037 96.26
SPEC06 Test - xalancbmk 744 1320 77.42
SPEC06 Test - gcc 1658 2946 77.68
SPEC06 Test - namd 798 1036 29.82
SPEC06 Test - milc 1809 2836 56.77
SPEC06 Test - Ibm 941 1617 71.84
SPEC06 Test - gromacs 754 1303 72.81
SPEC06 Test - wrf 945 1631 72.59
SPEC06 Test - astar 1200 1993 66.08
Post Run actions/checkout@v2 34 32  
Complete job 0 1  
All - EMU - Performance 16775 24397 45.44
Set up job 2 2  
Run actions/checkout@v2 42 15  
set env 32 31  
clean up 30 31  
Build MC EMU 6166 5497 -10.85
MC Test 102 110 7.84
SMP Linux 10530 13139 24.78
Post Run actions/checkout@v2 33 32  
Complete job 1 2  
All - EMU - MC 16938 18859 11.34

Since the CI server now is AMD Zen 2, the performance of Verilator is still bound to a lack of L3 cache and cores per CCX (4 for Zen 2), which needs some inter-CCX cache line transfer for 8 threads EMU. I suggest switching CI to run on an AMD Zen 3 server with doubled L3 Size per CCX and doubled core count per CCX.

XiangShanRobot commented 2 weeks ago
[Generated by IPC robot] commit: 7a4088381af74e330b03402daa40b65204398e8d commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
7a40883 1.805 0.448 2.039 1.187 2.936 2.508 2.197 0.930 1.378 1.441 3.428 2.669 2.399 2.932
master branch: commit astar copy_and_run coremark gcc gromacs lbm linux mcf microbench milc namd povray wrf xalancbmk
26c1abd 1.803 0.448 2.039 1.187 2.936 2.508 2.197 0.930 1.378 1.441 3.428 2.669 2.399 2.932
fd3aa05 1.803 0.448 2.040 1.187 2.936 2.508 2.197 0.930 1.378 1.441 3.428 2.669 2.399 2.932
1b0de92 1.808 0.447 2.043 1.187 2.938 2.508 2.197 0.921 1.369 1.441 3.454 2.658 2.399 2.932
d8a998b 1.808 0.447 2.043 1.187 2.938 2.508 2.197 0.921 1.369 1.441 3.454 2.658 2.399 2.932
ee8d1f1 1.808 0.447 2.043 1.187 2.938 2.508 2.197 0.921 1.369 1.441 3.454 2.658 2.399 2.932
fcec058 1.808 0.447 2.043 1.187 2.938 2.508 2.197 0.921 1.369 1.441 3.454 2.658 2.399 2.932
0fbf39a 1.808 0.447 2.043 1.187 2.938 2.508 2.197 0.921 1.369 1.441 3.454 2.658 2.399 2.932