gnelissen / np-schedulability-analysis

An implementation of schedulability tests for non-preemptive job sets, for uni- and global multiprocessors, with precedence constraints.
BSD 3-Clause "New" or "Revised" License
9 stars 11 forks source link

trying to understand impact of `vector<bool>` performance #26

Closed brandenburg closed 1 month ago

brandenburg commented 2 months ago

Sorry for opening yet another issue, but I figured this is better than to continue adding messages to the already closed PR #20.

I was wondering whether vector<bool> being poorly compiled could have affected evaluation results in earlier papers. It certainly would be unfortunate to accidentally have reported needlessly slow results.

To explore this further, I compiled the latest v3.0 branch and the index_set-optimization branch on our group's main Linux compute machine and compared the runtime performance with hyperfine. To my surprise (and maybe relief), I didn't see a significant difference:

Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_69.csv 204.8 ± 4.9 200.4 220.8 1.03 ± 0.06
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_69.csv 198.9 ± 10.2 179.4 221.6 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_73.csv 24.6 ± 3.2 10.2 27.8 1.00
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_73.csv 25.1 ± 1.6 14.1 27.2 1.02 ± 0.15
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest benchmarks/jobset-log-uniform-discrete_94.csv 17.9 ± 1.6 11.1 24.7 1.00
SAG-bitfield/build/nptest benchmarks/jobset-log-uniform-discrete_94.csv 18.0 ± 1.1 10.8 19.7 1.00 ± 0.11
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest benchmarks/jobset-uniform-discrete_29.csv 148.3 ± 5.1 129.2 153.6 1.00 ± 0.04
SAG-bitfield/build/nptest benchmarks/jobset-uniform-discrete_29.csv 148.1 ± 2.7 143.3 154.6 1.00

The compiler is identified as GNU 10.2.1. It appears that this version is compiling the vector<bool> code just fine.

What compiler versions did you see major slowdowns with?

porya-gohary commented 2 months ago

I have used GCC versions 11.4 and 12.3. But the time that you get for these job sets is too fast. (I am still thinking you were probably not using -m 4 option (these examples as I mentioned in #20 are for 4 cores). Because for example for jobset-log-uniform-discrete_69.csv I get this output:

jobsets/jobset-log-uniform-discrete_69.csv,  1,  68767,  1110411,  2843423,  1001,  2625.695838,  54.609375,  0,  4
brandenburg commented 2 months ago

Yes, the table shows the full invocation. I did not specify -m 4; I missed your comment.

brandenburg commented 2 months ago

I'm seeing this output now (on macOS):

# file name, schedulable?(1Y/0N), #jobs, #nodes, #states, #edges, max width, CPU time, memory, timeout, #CPUs
benchmarks/jobset-log-uniform-discrete_69.csv,  1,  68767,  1031793,  1111366,  2844642,  1001,  5.074074,  141080.000000,  0,  4

It agrees in outcome and #jobs, but not #nodes nor #edges. Is that reasonable? How is that possible?

porya-gohary commented 2 months ago

I'm seeing this output now (on macOS):

# file name, schedulable?(1Y/0N), #jobs, #nodes, #states, #edges, max width, CPU time, memory, timeout, #CPUs
benchmarks/jobset-log-uniform-discrete_69.csv,  1,  68767,  1031793,  1111366,  2844642,  1001,  5.074074,  141080.000000,  0,  4

It agrees in outcome and #jobs, but not #nodes nor #edges. Is that reasonable? How is that possible?

I guess this is the result that you got with v3.0 branch. right? The result I mentioned in the previous comment was for the master branch.

For me also the v3.0 branch which uses the new method gives the same result as you:

jobsets/jobset-log-uniform-discrete_69.csv,  1,  68767,  1031793,  1111366,  2844642,  1001,  4.936089,  70.277344,  0,  4
brandenburg commented 2 months ago

Here's the outcome of the rerun with -m 4 on my 2017 iMac with XCode's clang version:

Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 5.511 ± 0.186 5.301 6.526 1.02 ± 0.05
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 5.419 ± 0.178 5.320 6.601 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 492.6 ± 6.5 484.9 529.3 1.01 ± 0.02
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 488.4 ± 4.9 481.5 506.9 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 349.5 ± 3.1 344.8 365.7 1.01 ± 0.01
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 344.7 ± 2.7 341.2 361.9 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 707.2 ± 11.2 696.8 764.0 1.00 ± 0.02
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 706.6 ± 11.2 694.5 753.4 1.00

So at least it's still the case that there's no major performance difference on macOS.

brandenburg commented 2 months ago

Unfortunately, the Linux numbers I get now confirm the issue also for g++ 10.2.1. It's rather shocking how big the difference is.

Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 433.015 ± 0.270 432.657 433.591 116.65 ± 1.38
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 3.712 ± 0.044 3.680 3.813 1.00
brandenburg commented 2 months ago

Here are updated numbers from Linux. Out of curiosity, I also compiled the two SAG versions with clang++ version 18.1.8 (which is very recent), to see if g++ really is the culprit.

Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 437.694 ± 5.531 433.031 445.135 115.31 ± 1.65
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 3.796 ± 0.026 3.763 3.846 1.00
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 569.099 ± 4.771 559.504 573.826 149.93 ± 1.61
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_69.csv 4.105 ± 0.013 4.085 4.126 1.08 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 3.180 ± 0.029 3.149 3.257 9.68 ± 0.29
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 0.328 ± 0.009 0.311 0.337 1.00
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 4.008 ± 0.010 3.989 4.017 12.20 ± 0.35
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_73.csv 0.378 ± 0.011 0.362 0.386 1.15 ± 0.05
Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 1.596 ± 0.020 1.576 1.642 6.97 ± 0.18
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 0.229 ± 0.005 0.211 0.232 1.00
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 2.006 ± 0.011 1.991 2.017 8.76 ± 0.21
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-log-uniform-discrete_94.csv 0.259 ± 0.002 0.256 0.261 1.13 ± 0.03
Command Mean [s] Min [s] Max [s] Relative
SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 33.472 ± 0.078 33.357 33.585 67.90 ± 1.24
SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 0.493 ± 0.009 0.473 0.500 1.00
clang++/SAG-v3/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 41.914 ± 0.148 41.638 42.149 85.02 ± 1.56
clang++/SAG-bitfield/build/nptest -m 4 benchmarks/jobset-uniform-discrete_29.csv 0.521 ± 0.011 0.501 0.533 1.06 ± 0.03

This shows a couple of interesting things:

  1. The new bitfield variant is always the quickest. So as @gnelissen already observed, between these numbers and his observations on Windows, there is no reason to keep any other implementation around.
  2. The most recent stable clang actually has the very same issue as g++. Yet clang works just fine on macOS. This strongly suggests that the issue is in the STL implementation, and not in the compiler. (AFAIK clang++ and g++ use the same GNU standard library on Linux, but a different one on macOS.)
  3. clang++ produces moderately longer runtimes than g++ does. So for SAG g++ is still the preferred compiler.