Closed brandenburg closed 1 month ago
I have used GCC versions 11.4 and 12.3. But the time that you get for these job sets is too fast. (I am still thinking you were probably not using -m 4
option (these examples as I mentioned in #20 are for 4 cores). Because for example for jobset-log-uniform-discrete_69.csv
I get this output:
jobsets/jobset-log-uniform-discrete_69.csv, 1, 68767, 1110411, 2843423, 1001, 2625.695838, 54.609375, 0, 4
Yes, the table shows the full invocation. I did not specify -m 4
; I missed your comment.
I'm seeing this output now (on macOS):
# file name, schedulable?(1Y/0N), #jobs, #nodes, #states, #edges, max width, CPU time, memory, timeout, #CPUs
benchmarks/jobset-log-uniform-discrete_69.csv, 1, 68767, 1031793, 1111366, 2844642, 1001, 5.074074, 141080.000000, 0, 4
It agrees in outcome and #jobs, but not #nodes nor #edges. Is that reasonable? How is that possible?
I'm seeing this output now (on macOS):
# file name, schedulable?(1Y/0N), #jobs, #nodes, #states, #edges, max width, CPU time, memory, timeout, #CPUs benchmarks/jobset-log-uniform-discrete_69.csv, 1, 68767, 1031793, 1111366, 2844642, 1001, 5.074074, 141080.000000, 0, 4
It agrees in outcome and #jobs, but not #nodes nor #edges. Is that reasonable? How is that possible?
I guess this is the result that you got with v3.0
branch. right? The result I mentioned in the previous comment was for the master branch.
For me also the v3.0
branch which uses the new method gives the same result as you:
jobsets/jobset-log-uniform-discrete_69.csv, 1, 68767, 1031793, 1111366, 2844642, 1001, 4.936089, 70.277344, 0, 4
Here's the outcome of the rerun with -m 4
on my 2017 iMac with XCode's clang version:
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
5.511 ± 0.186 | 5.301 | 6.526 | 1.02 ± 0.05 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
5.419 ± 0.178 | 5.320 | 6.601 | 1.00 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
492.6 ± 6.5 | 484.9 | 529.3 | 1.01 ± 0.02 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
488.4 ± 4.9 | 481.5 | 506.9 | 1.00 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
349.5 ± 3.1 | 344.8 | 365.7 | 1.01 ± 0.01 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
344.7 ± 2.7 | 341.2 | 361.9 | 1.00 |
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
707.2 ± 11.2 | 696.8 | 764.0 | 1.00 ± 0.02 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
706.6 ± 11.2 | 694.5 | 753.4 | 1.00 |
So at least it's still the case that there's no major performance difference on macOS.
Unfortunately, the Linux numbers I get now confirm the issue also for g++
10.2.1. It's rather shocking how big the difference is.
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
433.015 ± 0.270 | 432.657 | 433.591 | 116.65 ± 1.38 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
3.712 ± 0.044 | 3.680 | 3.813 | 1.00 |
Here are updated numbers from Linux. Out of curiosity, I also compiled the two SAG versions with clang++
version 18.1.8 (which is very recent), to see if g++
really is the culprit.
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
437.694 ± 5.531 | 433.031 | 445.135 | 115.31 ± 1.65 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
3.796 ± 0.026 | 3.763 | 3.846 | 1.00 |
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
569.099 ± 4.771 | 559.504 | 573.826 | 149.93 ± 1.61 |
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv |
4.105 ± 0.013 | 4.085 | 4.126 | 1.08 ± 0.01 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
3.180 ± 0.029 | 3.149 | 3.257 | 9.68 ± 0.29 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
0.328 ± 0.009 | 0.311 | 0.337 | 1.00 |
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
4.008 ± 0.010 | 3.989 | 4.017 | 12.20 ± 0.35 |
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv |
0.378 ± 0.011 | 0.362 | 0.386 | 1.15 ± 0.05 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
1.596 ± 0.020 | 1.576 | 1.642 | 6.97 ± 0.18 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
0.229 ± 0.005 | 0.211 | 0.232 | 1.00 |
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
2.006 ± 0.011 | 1.991 | 2.017 | 8.76 ± 0.21 |
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv |
0.259 ± 0.002 | 0.256 | 0.261 | 1.13 ± 0.03 |
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
33.472 ± 0.078 | 33.357 | 33.585 | 67.90 ± 1.24 |
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
0.493 ± 0.009 | 0.473 | 0.500 | 1.00 |
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
41.914 ± 0.148 | 41.638 | 42.149 | 85.02 ± 1.56 |
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv |
0.521 ± 0.011 | 0.501 | 0.533 | 1.06 ± 0.03 |
This shows a couple of interesting things:
clang
actually has the very same issue as g++
. Yet clang
works just fine on macOS. This strongly suggests that the issue is in the STL implementation, and not in the compiler. (AFAIK clang++
and g++
use the same GNU standard library on Linux, but a different one on macOS.)clang++
produces moderately longer runtimes than g++
does. So for SAG g++
is still the preferred compiler.
Sorry for opening yet another issue, but I figured this is better than to continue adding messages to the already closed PR #20.
I was wondering whether
vector<bool>
being poorly compiled could have affected evaluation results in earlier papers. It certainly would be unfortunate to accidentally have reported needlessly slow results.To explore this further, I compiled the latest
v3.0
branch and theindex_set-optimization
branch on our group's main Linux compute machine and compared the runtime performance withhyperfine
. To my surprise (and maybe relief), I didn't see a significant difference:SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_69.csv
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_69.csv
SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_73.csv
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_73.csv
SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_94.csv
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_94.csv
SAG-v3/build/nptest benchmarks/jobset-uniform-discrete_29.csv
SAG-bitfield/build/nptest benchmarks/jobset-uniform-discrete_29.csv
The compiler is identified as
GNU 10.2.1
. It appears that this version is compiling thevector<bool>
code just fine.What compiler versions did you see major slowdowns with?