fair-acc / gnuradio4

Prototype implementations for a more compile-time efficient flowgraph API
Other
31 stars 12 forks source link

Performance optimization for MultiThreadedStrategy #373

Closed drslebedev closed 2 months ago

drslebedev commented 3 months ago

The bm_Buffer benchmark test indicated significant performance degradation for the multi-producer case following this commit: https://github.com/fair-acc/gnuradio4/commit/5bd3eed14076427b1d50997ae6c2a77863876121.

The CircularBuffer was significantly updated, it uses newly developed AtomicBitset to check for available slots.

Other changes:

Known issues:

Special thanks to @RalphSteinhagen and @wirew0rm for their help and discussions during work on this PR.

Performance results:

all micro-benchmarks passed:
┌──────────────────benchmark:──────────────────┬──────┬──CPU branch misses───┬─ops/s─┬──mean──┬─<CPU-I>─┬────CPU cache misses────┬─stddev─┬─#N─┬─CTX-SW─┬─total time─┬──min───┬─median─┬──max───┐
│                                              │ SKIP │                      │       │        │         │                        │        │    │        │            │        │        │        │
│    POSIX: 1 producers -< 1  >-> 1 consumers  │ PASS │ 59.8k / 519k = 11.5% │ 25.7M │  39 ms │    302m │   11.9k / 101k = 11.8% │   5 ms │ 8  │   105  │     312 ms │  36 ms │  37 ms │  53 ms │
│    POSIX: 1 producers -< 1  >-> 2 consumers  │ PASS │ 75.7k / 666k = 11.4% │ 18.8M │  53 ms │    387m │ 5.35k / 100.0k =  5.4% │ 709 us │ 8  │   112  │     425 ms │  52 ms │  53 ms │  55 ms │
│    POSIX: 1 producers -< 1  >-> 4 consumers  │ PASS │ 71.6k / 617k = 11.6% │ 12.3M │  82 ms │    359m │   18.5k / 122k = 15.2% │   3 ms │ 8  │   120  │     653 ms │  79 ms │  81 ms │  88 ms │
│    POSIX: 2 producers -< 1  >-> 1 consumers  │ PASS │ 70.8k / 586k = 12.1% │  5.9M │ 170 ms │    341m │   37.1k / 154k = 24.1% │  22 ms │ 8  │   135  │       1  s │ 136 ms │ 182 ms │ 192 ms │
│    POSIX: 2 producers -< 1  >-> 2 consumers  │ PASS │ 74.5k / 641k = 11.6% │  5.6M │ 178 ms │    372m │   15.1k / 132k = 11.5% │  10 ms │ 8  │   138  │       1  s │ 153 ms │ 182 ms │ 185 ms │
│    POSIX: 2 producers -< 1  >-> 4 consumers  │ PASS │ 85.4k / 741k = 11.5% │  5.6M │ 179 ms │    431m │   8.81k / 118k =  7.4% │   1 ms │ 8  │   136  │       1  s │ 177 ms │ 178 ms │ 181 ms │
│    POSIX: 4 producers -< 1  >-> 1 consumers  │ PASS │ 88.4k / 744k = 11.9% │  5.1M │ 195 ms │    433m │   35.7k / 162k = 22.1% │   3 ms │ 8  │   149  │       2  s │ 189 ms │ 196 ms │ 199 ms │
│    POSIX: 4 producers -< 1  >-> 2 consumers  │ PASS │ 97.6k / 852k = 11.5% │  5.3M │ 190 ms │    496m │   8.51k / 142k =  6.0% │  10 ms │ 8  │   161  │       2  s │ 178 ms │ 194 ms │ 202 ms │
│    POSIX: 4 producers -< 1  >-> 4 consumers  │ PASS │ 142k / 1.24M = 11.4% │  3.5M │ 289 ms │    725m │   12.1k / 222k =  5.5% │   6 ms │ 8  │   267  │       2  s │ 275 ms │ 291 ms │ 293 ms │
├──────────────────────────────────────────────┼──────┼──────────────────────┼───────┼────────┼─────────┼────────────────────────┼────────┼────┼────────┼────────────┼────────┼────────┼────────┤
│    POSIX: 1 producers -<1024>-> 1 consumers  │ PASS │ 33.2k / 298k = 11.1% │  3.3G │ 299 us │    173m │  1.59k / 10.8k = 14.8% │ 582 us │ 8  │     9  │       2 ms │  24 us │  25 us │   2 ms │
│    POSIX: 1 producers -<1024>-> 2 consumers  │ PASS │ 32.4k / 297k = 10.9% │ 10.1G │  99 us │    173m │  1.43k / 7.32k = 19.5% │ 150 us │ 8  │     3  │     789 us │  40 us │  43 us │ 496 us │
│    POSIX: 1 producers -<1024>-> 4 consumers  │ PASS │ 52.6k / 454k = 11.6% │  3.9G │ 253 us │    265m │  1.56k / 15.5k = 10.1% │ 212 us │ 8  │    12  │       2 ms │ 155 us │ 156 us │ 801 us │
│    POSIX: 2 producers -<1024>-> 1 consumers  │ PASS │ 65.0k / 561k = 11.6% │  219M │   5 ms │    328m │  6.44k / 62.4k = 10.3% │  48 us │ 8  │    64  │      36 ms │   4 ms │   5 ms │   5 ms │
│    POSIX: 2 producers -<1024>-> 2 consumers  │ PASS │ 59.2k / 511k = 11.6% │  217M │   5 ms │    298m │  6.63k / 62.0k = 10.7% │  19 us │ 8  │    64  │      37 ms │   5 ms │   5 ms │   5 ms │
│    POSIX: 2 producers -<1024>-> 4 consumers  │ PASS │ 71.1k / 614k = 11.6% │  224M │   4 ms │    359m │  4.78k / 54.5k =  8.8% │ 261 us │ 8  │    64  │      36 ms │   4 ms │   5 ms │   5 ms │
│    POSIX: 4 producers -<1024>-> 1 consumers  │ PASS │ 67.8k / 586k = 11.6% │  315M │   3 ms │    342m │  5.20k / 55.5k =  9.4% │ 541 us │ 8  │    57  │      25 ms │   3 ms │   3 ms │   5 ms │
│    POSIX: 4 producers -<1024>-> 2 consumers  │ PASS │ 72.8k / 629k = 11.6% │  313M │   3 ms │    367m │  4.20k / 51.4k =  8.2% │ 508 us │ 8  │    57  │      26 ms │   3 ms │   3 ms │   5 ms │
│    POSIX: 4 producers -<1024>-> 4 consumers  │ PASS │ 67.1k / 583k = 11.5% │  329M │   3 ms │    340m │  4.27k / 55.7k =  7.7% │  21 us │ 8  │    56  │      24 ms │   3 ms │   3 ms │   3 ms │
├──────────────────────────────────────────────┼──────┼──────────────────────┼───────┼────────┼─────────┼────────────────────────┼────────┼────┼────────┼────────────┼────────┼────────┼────────┤
│ portable: 1 producers -< 1  >-> 1 consumers  │ PASS │ 69.7k / 605k = 11.5% │ 28.0M │  36 ms │    353m │   12.3k / 103k = 11.9% │ 777 us │ 8  │   104  │     286 ms │  35 ms │  36 ms │  37 ms │
│ portable: 1 producers -< 1  >-> 2 consumers  │ PASS │ 66.9k / 530k = 12.6% │ 17.0M │  59 ms │    308m │   47.1k / 146k = 32.4% │   6 ms │ 8  │   112  │     472 ms │  53 ms │  57 ms │  67 ms │
│ portable: 1 producers -< 1  >-> 4 consumers  │ PASS │ 69.7k / 606k = 11.5% │  8.2M │ 122 ms │    352m │   7.67k / 114k =  6.7% │   4 ms │ 8  │   128  │     973 ms │ 112 ms │ 124 ms │ 126 ms │
│ portable: 2 producers -< 1  >-> 1 consumers  │ PASS │ 77.6k / 533k = 14.5% │  6.5M │ 155 ms │    310m │    106k / 223k = 47.5% │  13 ms │ 8  │   129  │       1  s │ 142 ms │ 151 ms │ 187 ms │
│ portable: 2 producers -< 1  >-> 2 consumers  │ PASS │ 71.1k / 625k = 11.4% │  5.5M │ 183 ms │    364m │   7.00k / 122k =  5.7% │   3 ms │ 8  │   136  │       1  s │ 177 ms │ 185 ms │ 187 ms │
│ portable: 2 producers -< 1  >-> 4 consumers  │ PASS │ 86.5k / 759k = 11.4% │  5.5M │ 182 ms │    442m │   4.40k / 117k =  3.7% │   4 ms │ 8  │   142  │       1  s │ 178 ms │ 182 ms │ 188 ms │
│ portable: 4 producers -< 1  >-> 1 consumers  │ PASS │ 86.2k / 755k = 11.4% │  5.1M │ 198 ms │    439m │   19.2k / 138k = 13.9% │   6 ms │ 8  │   155  │       2  s │ 186 ms │ 201 ms │ 206 ms │
│ portable: 4 producers -< 1  >-> 2 consumers  │ PASS │ 93.6k / 815k = 11.5% │  4.9M │ 205 ms │    474m │   8.79k / 140k =  6.3% │   3 ms │ 8  │   163  │       2  s │ 202 ms │ 204 ms │ 210 ms │
│ portable: 4 producers -< 1  >-> 4 consumers  │ PASS │ 137k / 1.21M = 11.4% │  3.6M │ 275 ms │    703m │   13.2k / 224k =  5.9% │  23 ms │ 8  │   264  │       2  s │ 217 ms │ 284 ms │ 292 ms │
├──────────────────────────────────────────────┼──────┼──────────────────────┼───────┼────────┼─────────┼────────────────────────┼────────┼────┼────────┼────────────┼────────┼────────┼────────┤
│ portable: 1 producers -<1024>-> 1 consumers  │ PASS │ 58.8k / 503k = 11.7% │  2.3G │ 430 us │    294m │  1.63k / 17.3k =  9.4% │ 581 us │ 8  │    16  │       3 ms │ 154 us │ 288 us │   2 ms │
│ portable: 1 producers -<1024>-> 2 consumers  │ PASS │ 59.4k / 510k = 11.6% │  3.4G │ 294 us │    298m │  1.55k / 15.0k = 10.3% │ 221 us │ 8  │    14  │       2 ms │ 153 us │ 155 us │ 799 us │
│ portable: 1 producers -<1024>-> 4 consumers  │ PASS │ 57.3k / 491k = 11.7% │  3.2G │ 315 us │    287m │  1.91k / 20.4k =  9.3% │  68 us │ 8  │    17  │       3 ms │ 288 us │ 289 us │ 495 us │
│ portable: 2 producers -<1024>-> 1 consumers  │ PASS │ 64.7k / 564k = 11.5% │  146M │   7 ms │    329m │  2.36k / 65.8k =  3.6% │  33 us │ 8  │    72  │      55 ms │   7 ms │   7 ms │   7 ms │
│ portable: 2 producers -<1024>-> 2 consumers  │ PASS │ 62.5k / 546k = 11.4% │  145M │   7 ms │    318m │  1.97k / 62.2k =  3.2% │  40 us │ 8  │    72  │      55 ms │   7 ms │   7 ms │   7 ms │
│ portable: 2 producers -<1024>-> 4 consumers  │ PASS │ 72.5k / 630k = 11.5% │  146M │   7 ms │    367m │  1.67k / 64.8k =  2.6% │  22 us │ 8  │    72  │      55 ms │   7 ms │   7 ms │   7 ms │
│ portable: 4 producers -<1024>-> 1 consumers  │ PASS │ 76.1k / 660k = 11.5% │  224M │   4 ms │    385m │  1.88k / 57.8k =  3.3% │ 237 us │ 8  │    65  │      36 ms │   4 ms │   5 ms │   5 ms │
│ portable: 4 producers -<1024>-> 2 consumers  │ PASS │ 74.2k / 645k = 11.5% │  225M │   4 ms │    377m │  2.75k / 54.2k =  5.1% │   1 ms │ 8  │    64  │      36 ms │   3 ms │   5 ms │   7 ms │
│ portable: 4 producers -<1024>-> 4 consumers  │ PASS │ 75.9k / 659k = 11.5% │  218M │   5 ms │    385m │  3.10k / 55.5k =  5.6% │  66 us │ 8  │    65  │      37 ms │   5 ms │   5 ms │   5 ms │
└──────────────────────────────────────────────┴──────┴──────────────────────┴───────┴────────┴─────────┴────────────────────────┴────────┴────┴────────┴────────────┴────────┴────────┴────────┘
sonarcloud[bot] commented 2 months ago

Quality Gate Failed Quality Gate failed

Failed conditions
58.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud