Open ocaisa opened 3 years ago
@ocaisa, fwiw, I get (on an AMD Naples node) 1 min 18 secs for foss
and 1 min 39 secs for intel
, so the problem seems to be system dependent?
That's good, maybe it was just other processes on the login node of Generoso where I ran the test.
I'm also seeing this. I was running inside a cgroup of 8 cores out of a 40 core cascadelake box:
And looking at the logs, each of the tests is about 25 times slower.
Edit: similar results with access to the full node as well. I stopped the foss one after 10 minutes vs 57.05 sec
Then my guess would be OMP_NUM_THREADS and OpenBLAS
I did notice that the MPI tests were using 200% CPU (even though the MPI call had -np 4
...and generoso has 8 cores)
For comparison, the build logs on Generoso for the foss
and intel
tests:
Test project /home/ocaisa/.local/easybuild/build/ELSI/2.6.4/foss-2020b-PEXSI/easybuild_obj
Start 1: test_fortran_01_elpa
1/56 Test #1: test_fortran_01_elpa ............. Passed 28.51 sec
Start 2: test_fortran_02_elpa
2/56 Test #2: test_fortran_02_elpa ............. Passed 22.86 sec
Start 3: test_fortran_03_elpa
3/56 Test #3: test_fortran_03_elpa ............. Passed 29.36 sec
Start 4: test_fortran_04_elpa
4/56 Test #4: test_fortran_04_elpa ............. Passed 21.66 sec
Start 5: test_fortran_05_elpa
5/56 Test #5: test_fortran_05_elpa ............. Passed 28.47 sec
Start 6: test_fortran_06_elpa
6/56 Test #6: test_fortran_06_elpa ............. Passed 24.98 sec
Start 7: test_fortran_07_elpa
7/56 Test #7: test_fortran_07_elpa ............. Passed 28.61 sec
Start 8: test_fortran_08_elpa
8/56 Test #8: test_fortran_08_elpa ............. Passed 22.14 sec
Start 9: test_fortran_09_elpa
9/56 Test #9: test_fortran_09_elpa ............. Passed 25.28 sec
Start 10: test_fortran_10_elpa
10/56 Test #10: test_fortran_10_elpa ............. Passed 14.28 sec
Start 11: test_fortran_11_elpa
11/56 Test #11: test_fortran_11_elpa ............. Passed 25.44 sec
Start 12: test_fortran_12_elpa
12/56 Test #12: test_fortran_12_elpa ............. Passed 14.31 sec
Start 13: test_fortran_13_elpa
13/56 Test #13: test_fortran_13_elpa ............. Passed 25.59 sec
Start 14: test_fortran_14_elpa
14/56 Test #14: test_fortran_14_elpa ............. Passed 14.25 sec
Start 15: test_fortran_15_elpa
15/56 Test #15: test_fortran_15_elpa ............. Passed 26.60 sec
Start 16: test_fortran_16_elpa
16/56 Test #16: test_fortran_16_elpa ............. Passed 14.56 sec
Start 17: test_fortran_01_omm
17/56 Test #17: test_fortran_01_omm .............. Passed 12.23 sec
Start 18: test_fortran_02_omm
18/56 Test #18: test_fortran_02_omm .............. Passed 13.99 sec
Start 19: test_fortran_03_omm
19/56 Test #19: test_fortran_03_omm .............. Passed 11.97 sec
Start 20: test_fortran_04_omm
20/56 Test #20: test_fortran_04_omm .............. Passed 9.63 sec
Start 21: test_fortran_05_omm
21/56 Test #21: test_fortran_05_omm .............. Passed 11.79 sec
Start 22: test_fortran_06_omm
22/56 Test #22: test_fortran_06_omm .............. Passed 9.63 sec
Start 23: test_fortran_07_omm
23/56 Test #23: test_fortran_07_omm .............. Passed 11.58 sec
Start 24: test_fortran_08_omm
24/56 Test #24: test_fortran_08_omm .............. Passed 9.32 sec
Start 25: test_fortran_01_pexsi
25/56 Test #25: test_fortran_01_pexsi ............ Passed 151.37 sec
Start 26: test_fortran_02_pexsi
26/56 Test #26: test_fortran_02_pexsi ............ Passed 166.81 sec
Start 27: test_fortran_03_pexsi
27/56 Test #27: test_fortran_03_pexsi ............ Passed 151.99 sec
Start 28: test_fortran_04_pexsi
28/56 Test #28: test_fortran_04_pexsi ............ Passed 163.70 sec
Start 29: test_fortran_05_pexsi
29/56 Test #29: test_fortran_05_pexsi ............ Passed 152.93 sec
Start 30: test_fortran_06_pexsi
30/56 Test #30: test_fortran_06_pexsi ............ Passed 167.22 sec
Start 31: test_fortran_07_pexsi
31/56 Test #31: test_fortran_07_pexsi ............ Passed 153.46 sec
Start 32: test_fortran_08_pexsi
32/56 Test #32: test_fortran_08_pexsi ............ Passed 162.83 sec
Start 33: test_fortran_01_ntpoly
33/56 Test #33: test_fortran_01_ntpoly ........... Passed 28.29 sec
Start 34: test_fortran_02_ntpoly
34/56 Test #34: test_fortran_02_ntpoly ........... Passed 25.97 sec
Start 35: test_fortran_03_ntpoly
35/56 Test #35: test_fortran_03_ntpoly ........... Passed 28.32 sec
Start 36: test_fortran_04_ntpoly
36/56 Test #36: test_fortran_04_ntpoly ........... Passed 29.14 sec
Start 37: test_fortran_05_ntpoly
37/56 Test #37: test_fortran_05_ntpoly ........... Passed 28.67 sec
Start 38: test_fortran_06_ntpoly
38/56 Test #38: test_fortran_06_ntpoly ........... Passed 26.51 sec
Start 39: test_fortran_07_ntpoly
39/56 Test #39: test_fortran_07_ntpoly ........... Passed 28.05 sec
Start 40: test_fortran_08_ntpoly
40/56 Test #40: test_fortran_08_ntpoly ........... Passed 27.69 sec
Start 41: test_serial_01_lapack
41/56 Test #41: test_serial_01_lapack ............ Passed 0.37 sec
Start 42: test_serial_02_lapack
42/56 Test #42: test_serial_02_lapack ............ Passed 0.52 sec
Start 43: test_matio_01
43/56 Test #43: test_matio_01 .................... Passed 0.25 sec
Start 44: test_matio_02
44/56 Test #44: test_matio_02 .................... Passed 0.28 sec
Start 45: test_matio_03
45/56 Test #45: test_matio_03 .................... Passed 0.37 sec
Start 46: test_matio_04
46/56 Test #46: test_matio_04 .................... Passed 0.38 sec
Start 47: test_c_01_elpa
47/56 Test #47: test_c_01_elpa ................... Passed 12.03 sec
Start 48: test_c_02_elpa
48/56 Test #48: test_c_02_elpa ................... Passed 5.39 sec
Start 49: test_c_03_elpa
49/56 Test #49: test_c_03_elpa ................... Passed 6.33 sec
Start 50: test_c_04_elpa
50/56 Test #50: test_c_04_elpa ................... Passed 4.08 sec
Start 51: test_c_01_omm
51/56 Test #51: test_c_01_omm .................... Passed 6.44 sec
Start 52: test_c_02_omm
52/56 Test #52: test_c_02_omm .................... Passed 4.09 sec
Start 53: test_c_01_pexsi
53/56 Test #53: test_c_01_pexsi .................. Passed 43.61 sec
Start 54: test_c_02_pexsi
54/56 Test #54: test_c_02_pexsi .................. Passed 87.91 sec
Start 55: test_c_01_ntpoly
55/56 Test #55: test_c_01_ntpoly ................. Passed 8.71 sec
Start 56: test_c_02_ntpoly
56/56 Test #56: test_c_02_ntpoly ................. Passed 8.01 sec
100% tests passed, 0 tests failed out of 56
Total Test time (real) = 2139.32 sec
and
Test project /home/ocaisa/.local/easybuild/build/ELSI/2.6.4/intel-2020b-PEXSI/easybuild_obj
Start 1: test_fortran_01_elpa
1/56 Test #1: test_fortran_01_elpa ............. Passed 1.55 sec
Start 2: test_fortran_02_elpa
2/56 Test #2: test_fortran_02_elpa ............. Passed 0.75 sec
Start 3: test_fortran_03_elpa
3/56 Test #3: test_fortran_03_elpa ............. Passed 0.72 sec
Start 4: test_fortran_04_elpa
4/56 Test #4: test_fortran_04_elpa ............. Passed 0.80 sec
Start 5: test_fortran_05_elpa
5/56 Test #5: test_fortran_05_elpa ............. Passed 0.70 sec
Start 6: test_fortran_06_elpa
6/56 Test #6: test_fortran_06_elpa ............. Passed 0.75 sec
Start 7: test_fortran_07_elpa
7/56 Test #7: test_fortran_07_elpa ............. Passed 0.73 sec
Start 8: test_fortran_08_elpa
8/56 Test #8: test_fortran_08_elpa ............. Passed 0.72 sec
Start 9: test_fortran_09_elpa
9/56 Test #9: test_fortran_09_elpa ............. Passed 0.68 sec
Start 10: test_fortran_10_elpa
10/56 Test #10: test_fortran_10_elpa ............. Passed 0.79 sec
Start 11: test_fortran_11_elpa
11/56 Test #11: test_fortran_11_elpa ............. Passed 0.68 sec
Start 12: test_fortran_12_elpa
12/56 Test #12: test_fortran_12_elpa ............. Passed 0.77 sec
Start 13: test_fortran_13_elpa
13/56 Test #13: test_fortran_13_elpa ............. Passed 0.72 sec
Start 14: test_fortran_14_elpa
14/56 Test #14: test_fortran_14_elpa ............. Passed 0.79 sec
Start 15: test_fortran_15_elpa
15/56 Test #15: test_fortran_15_elpa ............. Passed 0.75 sec
Start 16: test_fortran_16_elpa
16/56 Test #16: test_fortran_16_elpa ............. Passed 0.81 sec
Start 17: test_fortran_01_omm
17/56 Test #17: test_fortran_01_omm .............. Passed 0.67 sec
Start 18: test_fortran_02_omm
18/56 Test #18: test_fortran_02_omm .............. Passed 0.82 sec
Start 19: test_fortran_03_omm
19/56 Test #19: test_fortran_03_omm .............. Passed 0.78 sec
Start 20: test_fortran_04_omm
20/56 Test #20: test_fortran_04_omm .............. Passed 0.85 sec
Start 21: test_fortran_05_omm
21/56 Test #21: test_fortran_05_omm .............. Passed 0.76 sec
Start 22: test_fortran_06_omm
22/56 Test #22: test_fortran_06_omm .............. Passed 0.89 sec
Start 23: test_fortran_07_omm
23/56 Test #23: test_fortran_07_omm .............. Passed 0.77 sec
Start 24: test_fortran_08_omm
24/56 Test #24: test_fortran_08_omm .............. Passed 0.86 sec
Start 25: test_fortran_01_pexsi
25/56 Test #25: test_fortran_01_pexsi ............ Passed 3.91 sec
Start 26: test_fortran_02_pexsi
26/56 Test #26: test_fortran_02_pexsi ............ Passed 5.31 sec
Start 27: test_fortran_03_pexsi
27/56 Test #27: test_fortran_03_pexsi ............ Passed 3.88 sec
Start 28: test_fortran_04_pexsi
28/56 Test #28: test_fortran_04_pexsi ............ Passed 5.35 sec
Start 29: test_fortran_05_pexsi
29/56 Test #29: test_fortran_05_pexsi ............ Passed 3.99 sec
Start 30: test_fortran_06_pexsi
30/56 Test #30: test_fortran_06_pexsi ............ Passed 5.23 sec
Start 31: test_fortran_07_pexsi
31/56 Test #31: test_fortran_07_pexsi ............ Passed 4.05 sec
Start 32: test_fortran_08_pexsi
32/56 Test #32: test_fortran_08_pexsi ............ Passed 5.31 sec
Start 33: test_fortran_01_ntpoly
33/56 Test #33: test_fortran_01_ntpoly ........... Passed 3.28 sec
Start 34: test_fortran_02_ntpoly
34/56 Test #34: test_fortran_02_ntpoly ........... Passed 5.45 sec
Start 35: test_fortran_03_ntpoly
35/56 Test #35: test_fortran_03_ntpoly ........... Passed 2.78 sec
Start 36: test_fortran_04_ntpoly
36/56 Test #36: test_fortran_04_ntpoly ........... Passed 25.67 sec
Start 37: test_fortran_05_ntpoly
37/56 Test #37: test_fortran_05_ntpoly ........... Passed 2.98 sec
Start 38: test_fortran_06_ntpoly
38/56 Test #38: test_fortran_06_ntpoly ........... Passed 5.06 sec
Start 39: test_fortran_07_ntpoly
39/56 Test #39: test_fortran_07_ntpoly ........... Passed 3.02 sec
Start 40: test_fortran_08_ntpoly
40/56 Test #40: test_fortran_08_ntpoly ........... Passed 5.55 sec
Start 41: test_serial_01_lapack
41/56 Test #41: test_serial_01_lapack ............ Passed 0.50 sec
Start 42: test_serial_02_lapack
42/56 Test #42: test_serial_02_lapack ............ Passed 0.60 sec
Start 43: test_matio_01
43/56 Test #43: test_matio_01 .................... Passed 0.44 sec
Start 44: test_matio_02
44/56 Test #44: test_matio_02 .................... Passed 0.45 sec
Start 45: test_matio_03
45/56 Test #45: test_matio_03 .................... Passed 0.73 sec
Start 46: test_matio_04
46/56 Test #46: test_matio_04 .................... Passed 0.70 sec
Start 47: test_c_01_elpa
47/56 Test #47: test_c_01_elpa ................... Passed 0.68 sec
Start 48: test_c_02_elpa
48/56 Test #48: test_c_02_elpa ................... Passed 0.69 sec
Start 49: test_c_03_elpa
49/56 Test #49: test_c_03_elpa ................... Passed 0.65 sec
Start 50: test_c_04_elpa
50/56 Test #50: test_c_04_elpa ................... Passed 0.69 sec
Start 51: test_c_01_omm
51/56 Test #51: test_c_01_omm .................... Passed 0.72 sec
Start 52: test_c_02_omm
52/56 Test #52: test_c_02_omm .................... Passed 0.69 sec
Start 53: test_c_01_pexsi
53/56 Test #53: test_c_01_pexsi .................. Passed 1.39 sec
Start 54: test_c_02_pexsi
54/56 Test #54: test_c_02_pexsi .................. Passed 1.85 sec
Start 55: test_c_01_ntpoly
55/56 Test #55: test_c_01_ntpoly ................. Passed 1.38 sec
Start 56: test_c_02_ntpoly
56/56 Test #56: test_c_02_ntpoly ................. Passed 2.17 sec
100% tests passed, 0 tests failed out of 56
Total Test time (real) = 124.60 sec
weird - just tested again - I'm not restricting the number of threads, and I do see both foss
and intel
tests using many threads per process, which is probably suboptimal, but in my case they are still quite fast in both cases, and actually a bit faster for foss
...
Can anyone else test on AMD, to see if it is anything to do with Intel (e.g. avx512?)
I see exactly the same discrepancy on an Intel CPU.
When running the foss
tests, I can see a 300-400% CPU usage with 4 MPI threads, which seems correct (the node has 16 cores). Restricting the number of OpenMP threads to 1 makes the tests considerably slower.
I think it's also clear that the problem comes from ELSI itself, as the tests seem to run slower on foss
for all the back-ends.
Weird.
indeed, on Haswell it also takes too long for me. So it's not just the threads, it also depends on the cpu (?)
In https://github.com/easybuilders/easybuild-easyconfigs/pull/14180, @akesandgren includes a later ELSI with foss/2021a
. Tests are faster with that (12 minutes), but still with lots of thread activity and I suspect still slower than intel
(UPDATE: the intel
version was included in https://github.com/easybuilders/easybuild-easyconfigs/pull/14183 and the tests take under 2 minutes).
one difference I see is that on AMD all the processes and threads are using the same socket, while on Intel not only the processes span both sockets, which isn't a problem, but threads with the same parent process span different sockets, which could explain the performance difference
I'm not using any explicit affinity setting in these tests, so I suppose it's either OpenMPI or OpenBLAS that's setting different threading affinities in Intel and AMD?
On the other hand, if that was the case, surely this would have been noticed before (?)
curiouser and curiouser...
@vyu16 Maybe you have some hints here?
What is the environment under which this runs? If this happened all the time, we would definitely know. However, in other contexts, I have seen similar issues from mis-configured slurm (i.e., internal defaults in some slurm versions that can be catastrophic). Is slurm involved here?
@volkerblum I ran all my tests, including the slow ones on intel haswell, in an ssh session to a dedicated node that I had offlined from the batch system
In https://github.com/easybuilders/easybuild-easyconfigs/pull/14133, I saw a large discrepancy between the test times for
foss
andintel
(~2 minutes compared to ~30 minutes).Tagging @micaeljtoliveira