NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.32k stars 166 forks source link

fix thread-reduce performance regression #2944

Closed fbusato closed 2 days ago

fbusato commented 6 days ago

Fix nvbug: 4965585

The following routines showed performance regressions after the PR 2756:

The PR includes the following changes:

bernhardmgruber commented 6 days ago

Could you please show a benchmark diff of the three algorithms before #2756 and after this PR? We should see a net benefit then.

Instructions how to benchmark in case you need it: https://nvidia.github.io/cccl/cub/benchmarking.html

fbusato commented 6 days ago

Reduce Max

[0] NVIDIA H100 80GB HBM3

T{ct} OffsetT{ct} Elements{io} Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^28 107.167 us 0.39% 122.152 us 0.43% 14.986 us 13.98% SLOW
I8 I64 2^28 108.946 us 1.96% 109.357 us 2.03% 0.411 us 0.38% SAME
I16 I32 2^28 188.094 us 1.98% 190.235 us 2.02% 2.142 us 1.14% SAME
I16 I64 2^28 188.058 us 1.88% 189.965 us 1.94% 1.907 us 1.01% SAME
I32 I32 2^28 351.776 us 1.38% 352.035 us 1.45% 0.259 us 0.07% SAME
I32 I64 2^28 352.221 us 1.38% 352.644 us 1.41% 0.424 us 0.12% SAME
I64 I32 2^28 688.110 us 0.76% 687.958 us 0.81% -0.152 us -0.02% SAME
I64 I64 2^28 688.580 us 0.83% 688.623 us 0.85% 0.043 us 0.01% SAME
I128 I32 2^28 1.400 ms 0.27% 1.403 ms 0.28% 2.806 us 0.20% SAME
I128 I64 2^28 1.404 ms 1.43% 1.397 ms 1.56% -6.640 us -0.47% SAME
F32 I32 2^28 359.793 us 3.51% 359.978 us 3.54% 0.185 us 0.05% SAME
F32 I64 2^28 352.525 us 1.47% 352.488 us 1.44% -0.037 us -0.01% SAME
F64 I32 2^28 688.145 us 0.82% 688.022 us 0.78% -0.123 us -0.02% SAME
F64 I64 2^28 688.338 us 0.83% 688.361 us 0.90% 0.023 us 0.00% SAME
C64 I32 2^28 1.479 ms 0.06% 1.468 ms 0.07% -11.115 us -0.75% FAST
C64 I64 2^28 1.550 ms 0.07% 1.524 ms 0.07% -25.813 us -1.67% FAST

Select Flagged

T{ct} OffsetT{ct} IsInPlace{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 false 2^28 1 666.930 us 0.32% 646.117 us 0.29% -20.812 us -3.12% FAST
I8 I32 false 2^28 0.544 646.175 us 0.29% 624.590 us 0.24% -21.586 us -3.34% FAST
I8 I32 false 2^28 0 539.994 us 0.27% 525.318 us 0.22% -14.675 us -2.72% FAST
I8 I32 true 2^28 1 766.635 us 0.22% 747.091 us 0.21% -19.544 us -2.55% FAST
I8 I32 true 2^28 0.544 752.011 us 0.21% 727.137 us 0.20% -24.873 us -3.31% FAST
I8 I32 true 2^28 0 642.715 us 0.22% 629.653 us 0.20% -13.062 us -2.03% FAST
I8 I64 false 2^28 1 687.225 us 0.27% 655.494 us 0.23% -31.731 us -4.62% FAST
I8 I64 false 2^28 0.544 669.616 us 0.28% 638.069 us 0.21% -31.547 us -4.71% FAST
I8 I64 false 2^28 0 550.467 us 0.28% 530.086 us 0.26% -20.381 us -3.70% FAST
I8 I64 true 2^28 1 791.146 us 0.19% 755.288 us 0.23% -35.858 us -4.53% FAST
I8 I64 true 2^28 0.544 773.481 us 0.20% 739.127 us 0.21% -34.354 us -4.44% FAST
I8 I64 true 2^28 0 653.690 us 0.24% 636.169 us 0.22% -17.521 us -2.68% FAST
I16 I32 false 2^28 1 751.677 us 0.38% 765.957 us 0.36% 14.280 us 1.90% SLOW
I16 I32 false 2^28 0.544 704.825 us 0.30% 723.212 us 0.29% 18.387 us 2.61% SLOW
I16 I32 false 2^28 0 595.444 us 0.26% 563.434 us 0.24% -32.010 us -5.38% FAST
I16 I32 true 2^28 1 850.084 us 0.26% 862.771 us 0.25% 12.687 us 1.49% SLOW
I16 I32 true 2^28 0.544 812.488 us 0.22% 824.056 us 0.23% 11.568 us 1.42% SLOW
I16 I32 true 2^28 0 711.684 us 0.20% 674.914 us 0.21% -36.770 us -5.17% FAST
I16 I64 false 2^28 1 784.931 us 0.28% 783.574 us 0.31% -1.357 us -0.17% SAME
I16 I64 false 2^28 0.544 733.121 us 0.26% 731.942 us 0.28% -1.179 us -0.16% SAME
I16 I64 false 2^28 0 570.107 us 0.25% 571.236 us 0.26% 1.129 us 0.20% SAME
I16 I64 true 2^28 1 876.321 us 0.24% 875.572 us 0.24% -0.749 us -0.09% SAME
I16 I64 true 2^28 0.544 831.914 us 0.20% 832.294 us 0.20% 0.380 us 0.05% SAME
I16 I64 true 2^28 0 685.799 us 0.18% 687.535 us 0.18% 1.736 us 0.25% SLOW
I32 I32 false 2^28 1 1.122 ms 0.43% 1.022 ms 0.44% -100.502 us -8.96% FAST
I32 I32 false 2^28 0.544 1.018 ms 0.27% 892.510 us 0.33% -125.439 us -12.32% FAST
I32 I32 false 2^28 0 799.558 us 0.24% 664.654 us 0.27% -134.904 us -16.87% FAST
I32 I32 true 2^28 1 1.253 ms 0.23% 1.118 ms 0.34% -134.791 us -10.76% FAST
I32 I32 true 2^28 0.544 1.178 ms 0.19% 1.012 ms 0.30% -165.656 us -14.06% FAST
I32 I32 true 2^28 0 984.614 us 0.15% 793.354 us 0.23% -191.260 us -19.42% FAST
I32 I64 false 2^28 1 1.062 ms 0.58% 1.026 ms 0.43% -36.118 us -3.40% FAST
I32 I64 false 2^28 0.544 913.881 us 0.54% 888.767 us 0.36% -25.114 us -2.75% FAST
I32 I64 false 2^28 0 690.710 us 0.35% 668.789 us 0.29% -21.921 us -3.17% FAST
I32 I64 true 2^28 1 1.124 ms 0.43% 1.121 ms 0.35% -3.175 us -0.28% SAME
I32 I64 true 2^28 0.544 1.006 ms 0.31% 1.006 ms 0.30% -0.527 us -0.05% SAME
I32 I64 true 2^28 0 805.761 us 0.23% 798.519 us 0.22% -7.242 us -0.90% FAST
I64 I32 false 2^28 1 1.821 ms 0.43% 1.823 ms 0.45% 1.098 us 0.06% SAME
I64 I32 false 2^28 0.544 1.496 ms 0.61% 1.496 ms 0.59% 0.517 us 0.03% SAME
I64 I32 false 2^28 0 1.010 ms 0.39% 1.009 ms 0.40% -1.132 us -0.11% SAME
I64 I32 true 2^28 1 1.936 ms 0.33% 1.935 ms 0.31% -1.034 us -0.05% SAME
I64 I32 true 2^28 0.544 1.639 ms 0.40% 1.639 ms 0.43% 0.101 us 0.01% SAME
I64 I32 true 2^28 0 1.192 ms 0.26% 1.191 ms 0.26% -0.858 us -0.07% SAME
I64 I64 false 2^28 1 1.819 ms 0.41% 1.816 ms 0.41% -3.146 us -0.17% SAME
I64 I64 false 2^28 0.544 1.496 ms 0.60% 1.493 ms 0.60% -3.054 us -0.20% SAME
I64 I64 false 2^28 0 1.021 ms 0.43% 1.019 ms 0.46% -2.154 us -0.21% SAME
I64 I64 true 2^28 1 1.936 ms 0.33% 1.932 ms 0.32% -4.479 us -0.23% SAME
I64 I64 true 2^28 0.544 1.638 ms 0.41% 1.633 ms 0.43% -4.543 us -0.28% SAME
I64 I64 true 2^28 0 1.200 ms 0.28% 1.202 ms 0.26% 2.752 us 0.23% SAME
I128 I32 false 2^28 1 3.603 ms 0.46% 3.604 ms 0.45% 1.313 us 0.04% SAME
I128 I32 false 2^28 0.544 2.859 ms 0.84% 2.858 ms 0.82% -0.687 us -0.02% SAME
I128 I32 false 2^28 0 1.943 ms 0.69% 1.944 ms 0.68% 0.080 us 0.00% SAME
I128 I32 true 2^28 1 3.820 ms 0.44% 3.820 ms 0.45% 0.541 us 0.01% SAME
I128 I32 true 2^28 0.544 3.192 ms 0.59% 3.192 ms 0.56% 0.359 us 0.01% SAME
I128 I32 true 2^28 0 2.421 ms 0.40% 2.421 ms 0.43% -0.185 us -0.01% SAME
I128 I64 false 2^28 1 3.609 ms 0.59% 3.609 ms 0.59% -0.008 us -0.00% SAME
I128 I64 false 2^28 0.544 2.864 ms 0.82% 2.864 ms 0.84% 0.521 us 0.02% SAME
I128 I64 false 2^28 0 1.953 ms 0.69% 1.954 ms 0.72% 0.404 us 0.02% SAME
I128 I64 true 2^28 1 3.832 ms 0.44% 3.831 ms 0.43% -0.698 us -0.02% SAME
I128 I64 true 2^28 0.544 3.203 ms 0.57% 3.203 ms 0.56% -0.236 us -0.01% SAME
I128 I64 true 2^28 0 2.435 ms 0.40% 2.435 ms 0.39% -0.436 us -0.02% SAME
F32 I32 false 2^28 1 1.123 ms 0.85% 1.024 ms 1.03% -99.082 us -8.82% FAST
F32 I32 false 2^28 0.544 1.018 ms 0.27% 892.420 us 0.34% -125.529 us -12.33% FAST
F32 I32 false 2^28 0 799.450 us 0.22% 664.718 us 0.27% -134.732 us -16.85% FAST
F32 I32 true 2^28 1 1.253 ms 0.25% 1.117 ms 0.34% -136.310 us -10.88% FAST
F32 I32 true 2^28 0.544 1.178 ms 0.20% 1.011 ms 0.28% -166.523 us -14.14% FAST
F32 I32 true 2^28 0 984.513 us 0.15% 793.035 us 0.23% -191.478 us -19.45% FAST
F32 I64 false 2^28 1 1.062 ms 0.59% 1.025 ms 0.41% -36.465 us -3.43% FAST
F32 I64 false 2^28 0.544 913.749 us 0.53% 888.043 us 0.37% -25.705 us -2.81% FAST
F32 I64 false 2^28 0 689.779 us 0.36% 668.831 us 0.30% -20.948 us -3.04% FAST
F32 I64 true 2^28 1 1.123 ms 0.44% 1.120 ms 0.36% -2.997 us -0.27% SAME
F32 I64 true 2^28 0.544 1.006 ms 0.31% 1.005 ms 0.28% -0.527 us -0.05% SAME
F32 I64 true 2^28 0 805.759 us 0.24% 798.182 us 0.23% -7.577 us -0.94% FAST
F64 I32 false 2^28 1 1.822 ms 0.46% 1.823 ms 0.46% 0.258 us 0.01% SAME
F64 I32 false 2^28 0.544 1.496 ms 0.61% 1.497 ms 0.59% 0.757 us 0.05% SAME
F64 I32 false 2^28 0 1.010 ms 0.38% 1.009 ms 0.40% -0.575 us -0.06% SAME
F64 I32 true 2^28 1 1.936 ms 0.30% 1.935 ms 0.31% -1.120 us -0.06% SAME
F64 I32 true 2^28 0.544 1.639 ms 0.41% 1.639 ms 0.42% -0.057 us -0.00% SAME
F64 I32 true 2^28 0 1.192 ms 0.26% 1.192 ms 0.26% -0.607 us -0.05% SAME
F64 I64 false 2^28 1 1.819 ms 0.41% 1.816 ms 0.38% -2.957 us -0.16% SAME
F64 I64 false 2^28 0.544 1.496 ms 0.60% 1.493 ms 0.61% -3.213 us -0.21% SAME
F64 I64 false 2^28 0 1.021 ms 0.45% 1.019 ms 0.44% -2.082 us -0.20% SAME
F64 I64 true 2^28 1 1.936 ms 0.33% 1.931 ms 0.31% -4.830 us -0.25% SAME
F64 I64 true 2^28 0.544 1.638 ms 0.41% 1.633 ms 0.42% -4.501 us -0.27% SAME
F64 I64 true 2^28 0 1.200 ms 0.25% 1.203 ms 0.27% 2.579 us 0.21% SAME

Reduce by-Key

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} MaxSegSize Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^28 2^1 1.042 ms 0.48% 1.070 ms 0.47% 27.182 us 2.61% SLOW
I8 I8 I32 2^28 2^4 912.502 us 0.45% 936.005 us 0.43% 23.503 us 2.58% SLOW
I8 I8 I32 2^28 2^8 905.978 us 0.29% 928.088 us 0.30% 22.110 us 2.44% SLOW
I8 I16 I32 2^28 2^1 1.595 ms 0.28% 1.592 ms 0.29% -2.567 us -0.16% SAME
I8 I16 I32 2^28 2^4 1.262 ms 0.38% 1.240 ms 0.43% -22.581 us -1.79% FAST
I8 I16 I32 2^28 2^8 1.194 ms 0.29% 1.172 ms 0.28% -22.151 us -1.86% FAST
I8 I32 I32 2^28 2^1 1.332 ms 0.59% 1.316 ms 0.60% -16.514 us -1.24% FAST
I8 I32 I32 2^28 2^4 1.015 ms 0.49% 1.002 ms 0.51% -13.342 us -1.31% FAST
I8 I32 I32 2^28 2^8 946.310 us 0.39% 938.284 us 0.42% -8.026 us -0.85% FAST
I8 I64 I32 2^28 2^1 2.028 ms 0.46% 2.032 ms 0.47% 4.866 us 0.24% SAME
I8 I64 I32 2^28 2^4 1.442 ms 0.36% 1.450 ms 0.34% 8.921 us 0.62% SLOW
I8 I64 I32 2^28 2^8 1.308 ms 0.21% 1.320 ms 0.21% 11.956 us 0.91% SLOW
I8 I128 I32 2^28 2^1 5.109 ms 0.17% 5.127 ms 0.18% 17.669 us 0.35% SLOW
I8 I128 I32 2^28 2^4 4.173 ms 0.27% 4.195 ms 0.27% 21.329 us 0.51% SLOW
I8 I128 I32 2^28 2^8 4.015 ms 0.31% 4.034 ms 0.32% 18.978 us 0.47% SLOW
I8 F32 I32 2^28 2^1 1.337 ms 0.75% 1.322 ms 0.78% -14.694 us -1.10% FAST
I8 F32 I32 2^28 2^4 1.018 ms 0.52% 1.007 ms 0.49% -10.626 us -1.04% FAST
I8 F32 I32 2^28 2^8 952.242 us 0.41% 947.759 us 0.42% -4.483 us -0.47% FAST
I8 F64 I32 2^28 2^1 2.043 ms 0.50% 2.046 ms 0.47% 2.364 us 0.12% SAME
I8 F64 I32 2^28 2^4 1.459 ms 0.35% 1.465 ms 0.33% 6.160 us 0.42% SLOW
I8 F64 I32 2^28 2^8 1.325 ms 0.19% 1.332 ms 0.19% 7.488 us 0.57% SLOW
I8 C64 I32 2^28 2^1 5.539 ms 0.18% 5.340 ms 0.18% -199.691 us -3.60% FAST
I8 C64 I32 2^28 2^4 4.998 ms 0.10% 4.802 ms 0.10% -195.590 us -3.91% FAST
I8 C64 I32 2^28 2^8 4.848 ms 0.09% 4.655 ms 0.09% -193.759 us -4.00% FAST
I16 I8 I32 2^28 2^1 1.228 ms 0.54% 1.382 ms 0.44% 153.882 us 12.53% SLOW
I16 I8 I32 2^28 2^4 961.367 us 0.50% 1.107 ms 0.47% 145.262 us 15.11% SLOW
I16 I8 I32 2^28 2^8 928.405 us 0.38% 1.070 ms 0.28% 141.610 us 15.25% SLOW
I16 I16 I32 2^28 2^1 1.136 ms 0.48% 1.139 ms 0.48% 2.700 us 0.24% SAME
I16 I16 I32 2^28 2^4 988.966 us 0.26% 990.565 us 0.31% 1.599 us 0.16% SAME
I16 I16 I32 2^28 2^8 961.984 us 0.16% 963.443 us 0.20% 1.459 us 0.15% SAME
I16 I32 I32 2^28 2^1 1.395 ms 0.54% 1.391 ms 0.60% -4.513 us -0.32% SAME
I16 I32 I32 2^28 2^4 994.327 us 0.44% 994.688 us 0.48% 0.361 us 0.04% SAME
I16 I32 I32 2^28 2^8 930.997 us 0.28% 931.975 us 0.29% 0.978 us 0.11% SAME
I16 I64 I32 2^28 2^1 1.954 ms 0.61% 1.956 ms 0.60% 1.804 us 0.09% SAME
I16 I64 I32 2^28 2^4 1.367 ms 0.35% 1.378 ms 0.35% 10.904 us 0.80% SLOW
I16 I64 I32 2^28 2^8 1.241 ms 0.25% 1.254 ms 0.23% 12.715 us 1.02% SLOW
I16 I128 I32 2^28 2^1 5.191 ms 0.17% 5.202 ms 0.17% 10.509 us 0.20% SLOW
I16 I128 I32 2^28 2^4 4.221 ms 0.29% 4.239 ms 0.30% 17.871 us 0.42% SLOW
I16 I128 I32 2^28 2^8 4.044 ms 0.32% 4.061 ms 0.32% 16.941 us 0.42% SLOW
I16 F32 I32 2^28 2^1 1.394 ms 0.73% 1.257 ms 0.94% -136.863 us -9.82% FAST
I16 F32 I32 2^28 2^4 993.771 us 0.46% 861.820 us 0.55% -131.951 us -13.28% FAST
I16 F32 I32 2^28 2^8 938.546 us 0.25% 791.470 us 0.41% -147.076 us -15.67% FAST
I16 F64 I32 2^28 2^1 1.967 ms 0.55% 1.962 ms 0.63% -4.929 us -0.25% SAME
I16 F64 I32 2^28 2^4 1.397 ms 0.33% 1.382 ms 0.34% -14.949 us -1.07% FAST
I16 F64 I32 2^28 2^8 1.274 ms 0.20% 1.259 ms 0.23% -14.515 us -1.14% FAST
I16 C64 I32 2^28 2^1 5.203 ms 0.18% 4.865 ms 0.20% -337.605 us -6.49% FAST
I16 C64 I32 2^28 2^4 4.583 ms 0.11% 4.219 ms 0.12% -364.973 us -7.96% FAST
I16 C64 I32 2^28 2^8 4.399 ms 0.10% 4.021 ms 0.11% -377.759 us -8.59% FAST
I32 I8 I32 2^28 2^1 1.267 ms 0.53% 1.275 ms 0.52% 8.074 us 0.64% SLOW
I32 I8 I32 2^28 2^4 921.451 us 0.41% 934.302 us 0.40% 12.851 us 1.39% SLOW
I32 I8 I32 2^28 2^8 868.209 us 0.31% 876.769 us 0.33% 8.560 us 0.99% SLOW
I32 I16 I32 2^28 2^1 1.667 ms 0.53% 1.423 ms 0.75% -244.123 us -14.65% FAST
I32 I16 I32 2^28 2^4 1.285 ms 0.33% 975.869 us 0.61% -308.814 us -24.04% FAST
I32 I16 I32 2^28 2^8 1.178 ms 0.32% 902.246 us 0.56% -275.914 us -23.42% FAST
I32 I32 I32 2^28 2^1 1.436 ms 0.86% 1.445 ms 0.93% 8.957 us 0.62% SAME
I32 I32 I32 2^28 2^4 965.283 us 0.63% 966.743 us 0.64% 1.460 us 0.15% SAME
I32 I32 I32 2^28 2^8 887.400 us 0.46% 911.902 us 0.40% 24.502 us 2.76% SLOW
I32 I64 I32 2^28 2^1 2.147 ms 0.64% 2.150 ms 0.67% 3.266 us 0.15% SAME
I32 I64 I32 2^28 2^4 1.453 ms 0.43% 1.460 ms 0.42% 6.317 us 0.43% SLOW
I32 I64 I32 2^28 2^8 1.294 ms 0.27% 1.303 ms 0.25% 8.674 us 0.67% SLOW
I32 I128 I32 2^28 2^1 5.310 ms 0.20% 5.322 ms 0.20% 11.899 us 0.22% SLOW
I32 I128 I32 2^28 2^4 4.277 ms 0.28% 4.293 ms 0.28% 15.860 us 0.37% SLOW
I32 I128 I32 2^28 2^8 4.080 ms 0.33% 4.096 ms 0.35% 15.467 us 0.38% SLOW
I32 F32 I32 2^28 2^1 1.628 ms 0.78% 1.625 ms 0.78% -3.238 us -0.20% SAME
I32 F32 I32 2^28 2^4 1.239 ms 0.53% 1.230 ms 0.56% -8.616 us -0.70% FAST
I32 F32 I32 2^28 2^8 1.135 ms 0.89% 1.118 ms 0.62% -16.887 us -1.49% FAST
I32 F64 I32 2^28 2^1 2.246 ms 0.57% 2.247 ms 0.58% 0.285 us 0.01% SAME
I32 F64 I32 2^28 2^4 1.570 ms 0.34% 1.573 ms 0.32% 3.117 us 0.20% SAME
I32 F64 I32 2^28 2^8 1.418 ms 0.18% 1.422 ms 0.18% 3.940 us 0.28% SLOW
I32 C64 I32 2^28 2^1 5.787 ms 0.19% 5.586 ms 0.20% -201.635 us -3.48% FAST
I32 C64 I32 2^28 2^4 5.119 ms 0.10% 4.926 ms 0.10% -192.607 us -3.76% FAST
I32 C64 I32 2^28 2^8 4.917 ms 0.10% 4.724 ms 0.10% -193.088 us -3.93% FAST
I64 I8 I32 2^28 2^1 2.092 ms 0.58% 2.059 ms 0.62% -32.968 us -1.58% FAST
I64 I8 I32 2^28 2^4 1.534 ms 0.35% 1.520 ms 0.37% -13.774 us -0.90% FAST
I64 I8 I32 2^28 2^8 1.404 ms 0.26% 1.387 ms 0.28% -16.891 us -1.20% FAST
I64 I16 I32 2^28 2^1 2.022 ms 0.66% 2.017 ms 0.68% -5.418 us -0.27% SAME
I64 I16 I32 2^28 2^4 1.422 ms 0.38% 1.423 ms 0.40% 0.243 us 0.02% SAME
I64 I16 I32 2^28 2^8 1.299 ms 0.22% 1.293 ms 0.23% -5.245 us -0.40% FAST
I64 I32 I32 2^28 2^1 2.166 ms 0.68% 2.166 ms 0.69% 0.117 us 0.01% SAME
I64 I32 I32 2^28 2^4 1.406 ms 0.46% 1.407 ms 0.47% 0.556 us 0.04% SAME
I64 I32 I32 2^28 2^8 1.235 ms 0.19% 1.236 ms 0.19% 0.649 us 0.05% SAME
I64 I64 I32 2^28 2^1 2.670 ms 0.62% 2.688 ms 0.63% 18.454 us 0.69% SLOW
I64 I64 I32 2^28 2^4 1.751 ms 0.42% 1.763 ms 0.49% 12.425 us 0.71% SLOW
I64 I64 I32 2^28 2^8 1.555 ms 0.29% 1.570 ms 0.56% 15.361 us 0.99% SLOW
I64 I128 I32 2^28 2^1 6.180 ms 0.16% 6.196 ms 0.16% 16.385 us 0.27% SLOW
I64 I128 I32 2^28 2^4 4.997 ms 0.26% 5.018 ms 0.25% 21.122 us 0.42% SLOW
I64 I128 I32 2^28 2^8 4.747 ms 0.32% 4.772 ms 0.32% 25.223 us 0.53% SLOW
I64 F32 I32 2^28 2^1 2.169 ms 0.92% 2.171 ms 0.91% 1.402 us 0.06% SAME
I64 F32 I32 2^28 2^4 1.407 ms 0.48% 1.408 ms 0.48% 0.848 us 0.06% SAME
I64 F32 I32 2^28 2^8 1.235 ms 0.19% 1.238 ms 0.18% 3.321 us 0.27% SLOW
I64 F64 I32 2^28 2^1 2.681 ms 0.65% 2.700 ms 0.66% 18.499 us 0.69% SLOW
I64 F64 I32 2^28 2^4 1.778 ms 0.39% 1.763 ms 0.51% -15.560 us -0.88% FAST
I64 F64 I32 2^28 2^8 1.577 ms 0.28% 1.564 ms 0.56% -13.492 us -0.86% FAST
I64 C64 I32 2^28 2^1 5.736 ms 0.20% 5.633 ms 0.22% -103.435 us -1.80% FAST
I64 C64 I32 2^28 2^4 4.897 ms 0.12% 4.777 ms 0.13% -120.014 us -2.45% FAST
I64 C64 I32 2^28 2^8 4.579 ms 0.12% 4.445 ms 0.12% -133.984 us -2.93% FAST
I128 I8 I32 2^28 2^1 3.566 ms 0.59% 3.560 ms 0.55% -5.921 us -0.17% SAME
I128 I8 I32 2^28 2^4 2.428 ms 0.78% 2.445 ms 0.78% 16.930 us 0.70% SAME
I128 I8 I32 2^28 2^8 2.194 ms 1.10% 2.195 ms 1.15% 0.625 us 0.03% SAME
I128 I16 I32 2^28 2^1 3.649 ms 0.78% 3.641 ms 0.74% -7.779 us -0.21% SAME
I128 I16 I32 2^28 2^4 2.362 ms 0.88% 2.361 ms 0.86% -0.855 us -0.04% SAME
I128 I16 I32 2^28 2^8 2.134 ms 1.08% 2.120 ms 1.15% -14.144 us -0.66% SAME
I128 I32 I32 2^28 2^1 3.539 ms 0.64% 3.537 ms 0.71% -1.413 us -0.04% SAME
I128 I32 I32 2^28 2^4 2.379 ms 0.91% 2.381 ms 0.86% 1.736 us 0.07% SAME
I128 I32 I32 2^28 2^8 2.149 ms 1.20% 2.151 ms 1.22% 2.119 us 0.10% SAME
I128 I64 I32 2^28 2^1 4.331 ms 0.65% 4.325 ms 0.64% -6.174 us -0.14% SAME
I128 I64 I32 2^28 2^4 3.016 ms 0.67% 3.016 ms 0.71% -0.017 us -0.00% SAME
I128 I64 I32 2^28 2^8 2.733 ms 1.06% 2.731 ms 1.03% -1.767 us -0.06% SAME
I128 I128 I32 2^28 2^1 6.869 ms 0.35% 6.895 ms 0.38% 26.568 us 0.39% SLOW
I128 I128 I32 2^28 2^4 5.371 ms 0.32% 5.399 ms 0.31% 27.652 us 0.51% SLOW
I128 I128 I32 2^28 2^8 5.069 ms 0.39% 5.088 ms 0.39% 18.667 us 0.37% SAME
I128 F32 I32 2^28 2^1 3.543 ms 0.64% 3.544 ms 0.68% 1.352 us 0.04% SAME
I128 F32 I32 2^28 2^4 2.381 ms 0.87% 2.385 ms 0.84% 3.690 us 0.15% SAME
I128 F32 I32 2^28 2^8 2.157 ms 1.23% 2.160 ms 1.20% 2.648 us 0.12% SAME
I128 F64 I32 2^28 2^1 4.089 ms 0.73% 4.097 ms 0.71% 8.141 us 0.20% SAME
I128 F64 I32 2^28 2^4 2.718 ms 1.00% 2.730 ms 1.06% 11.515 us 0.42% SAME
I128 F64 I32 2^28 2^8 2.407 ms 1.50% 2.415 ms 1.42% 7.752 us 0.32% SAME
I128 C64 I32 2^28 2^1 12.900 ms 0.17% 12.900 ms 0.17% 0.389 us 0.00% SAME
I128 C64 I32 2^28 2^4 11.663 ms 0.24% 11.663 ms 0.24% 0.153 us 0.00% SAME
I128 C64 I32 2^28 2^8 11.350 ms 0.28% 11.350 ms 0.28% -0.591 us -0.01% SAME
github-actions[bot] commented 6 days ago
🟩 CI finished in 3h 35m: Pass: 100%/224 | Total: 6d 16h | Avg: 42m 56s | Max: 1h 18m | Hits: 61%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 2d 13h | Avg: 33m 10s | Max: 1h 03m | Hits: 70%/9260

    ``` 🟩 cmake_options 🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2 | Total: 49m 29s | Avg: 24m 44s | Max: 34m 47s 🟩 cpu 🟩 amd64 Pass: 100%/103 | Total: 2d 09h | Avg: 33m 19s | Max: 1h 03m | Hits: 70%/9260 🟩 arm64 Pass: 100%/8 | Total: 4h 08m | Avg: 31m 07s | Max: 37m 26s 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 7h 55m | Avg: 31m 40s | Max: 56m 44s | Hits: 62%/1852 🟩 11.8 Pass: 100%/3 | Total: 2h 04m | Avg: 41m 37s | Max: 46m 35s 🟩 12.5 Pass: 100%/4 | Total: 3h 39m | Avg: 54m 47s | Max: 57m 00s 🟩 12.6 Pass: 100%/89 | Total: 1d 23h | Avg: 32m 10s | Max: 1h 03m | Hits: 71%/7408 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 1h 47m | Avg: 26m 54s | Max: 29m 35s 🟩 nvcc11.1 Pass: 100%/15 | Total: 7h 55m | Avg: 31m 40s | Max: 56m 44s | Hits: 62%/1852 🟩 nvcc11.8 Pass: 100%/3 | Total: 2h 04m | Avg: 41m 37s | Max: 46m 35s 🟩 nvcc12.5 Pass: 100%/4 | Total: 3h 39m | Avg: 54m 47s | Max: 57m 00s 🟩 nvcc12.6 Pass: 100%/85 | Total: 1d 21h | Avg: 32m 24s | Max: 1h 03m | Hits: 71%/7408 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 1h 47m | Avg: 26m 54s | Max: 29m 35s 🟩 nvcc Pass: 100%/107 | Total: 2d 11h | Avg: 33m 24s | Max: 1h 03m | Hits: 70%/9260 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 3h 09m | Avg: 31m 39s | Max: 36m 46s 🟩 Clang10 Pass: 100%/3 | Total: 1h 48m | Avg: 36m 06s | Max: 40m 07s 🟩 Clang11 Pass: 100%/4 | Total: 2h 06m | Avg: 31m 41s | Max: 34m 09s 🟩 Clang12 Pass: 100%/4 | Total: 2h 15m | Avg: 33m 58s | Max: 36m 53s 🟩 Clang13 Pass: 100%/4 | Total: 2h 18m | Avg: 34m 36s | Max: 37m 45s 🟩 Clang14 Pass: 100%/4 | Total: 2h 09m | Avg: 32m 18s | Max: 33m 36s 🟩 Clang15 Pass: 100%/4 | Total: 2h 16m | Avg: 34m 01s | Max: 36m 53s 🟩 Clang16 Pass: 100%/4 | Total: 2h 14m | Avg: 33m 30s | Max: 37m 37s 🟩 Clang17 Pass: 100%/4 | Total: 2h 16m | Avg: 34m 01s | Max: 37m 50s 🟩 Clang18 Pass: 100%/11 | Total: 4h 50m | Avg: 26m 23s | Max: 35m 45s 🟩 GCC6 Pass: 100%/2 | Total: 55m 39s | Avg: 27m 49s | Max: 31m 27s 🟩 GCC7 Pass: 100%/6 | Total: 3h 17m | Avg: 32m 58s | Max: 38m 30s 🟩 GCC8 Pass: 100%/6 | Total: 3h 10m | Avg: 31m 43s | Max: 37m 15s 🟩 GCC9 Pass: 100%/6 | Total: 3h 13m | Avg: 32m 16s | Max: 38m 34s 🟩 GCC10 Pass: 100%/4 | Total: 2h 17m | Avg: 34m 26s | Max: 38m 45s 🟩 GCC11 Pass: 100%/7 | Total: 4h 25m | Avg: 37m 59s | Max: 46m 35s 🟩 GCC12 Pass: 100%/4 | Total: 2h 13m | Avg: 33m 29s | Max: 36m 10s 🟩 GCC13 Pass: 100%/16 | Total: 6h 09m | Avg: 23m 06s | Max: 41m 12s 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 2h 02m | Avg: 40m 53s | Max: 46m 14s 🟩 MSVC14.16 Pass: 100%/1 | Total: 56m 44s | Avg: 56m 44s | Max: 56m 44s | Hits: 62%/1852 🟩 MSVC14.29 Pass: 100%/2 | Total: 2h 05m | Avg: 1h 02m | Max: 1h 03m | Hits: 62%/3704 🟩 MSVC14.39 Pass: 100%/2 | Total: 1h 28m | Avg: 44m 01s | Max: 1h 03m | Hits: 81%/3704 🟩 NVHPC24.7 Pass: 100%/4 | Total: 3h 39m | Avg: 54m 47s | Max: 57m 00s 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 01h | Avg: 31m 46s | Max: 40m 07s 🟩 GCC Pass: 100%/51 | Total: 1d 01h | Avg: 30m 17s | Max: 46m 35s 🟩 Intel Pass: 100%/3 | Total: 2h 02m | Avg: 40m 53s | Max: 46m 14s 🟩 MSVC Pass: 100%/5 | Total: 4h 30m | Avg: 54m 03s | Max: 1h 03m | Hits: 70%/9260 🟩 NVHPC Pass: 100%/4 | Total: 3h 39m | Avg: 54m 47s | Max: 57m 00s 🟩 gpu 🟩 v100 Pass: 100%/111 | Total: 2d 13h | Avg: 33m 10s | Max: 1h 03m | Hits: 70%/9260 🟩 jobs 🟩 Build Pass: 100%/103 | Total: 2d 11h | Avg: 34m 39s | Max: 1h 03m | Hits: 62%/7408 🟩 TestCPU Pass: 100%/4 | Total: 46m 39s | Avg: 11m 39s | Max: 24m 21s | Hits: 99%/1852 🟩 TestGPU Pass: 100%/4 | Total: 1h 04m | Avg: 16m 14s | Max: 21m 54s 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 2h 04m | Avg: 41m 37s | Max: 46m 35s 🟩 90a Pass: 100%/4 | Total: 1h 15m | Avg: 18m 47s | Max: 21m 39s 🟩 std 🟩 11 Pass: 100%/30 | Total: 13h 27m | Avg: 26m 55s | Max: 50m 46s 🟩 14 Pass: 100%/29 | Total: 17h 56m | Avg: 37m 06s | Max: 1h 03m | Hits: 62%/3704 🟩 17 Pass: 100%/27 | Total: 16h 35m | Avg: 36m 52s | Max: 1h 01m | Hits: 62%/1852 🟩 20 Pass: 100%/23 | Total: 12h 32m | Avg: 32m 43s | Max: 1h 03m | Hits: 81%/3704 ```
  • 🟩 cub: Pass: 100%/110 | Total: 4d 02h | Avg: 53m 43s | Max: 1h 18m | Hits: 36%/3028

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/102 | Total: 3d 18h | Avg: 53m 30s | Max: 1h 18m | Hits: 36%/3028 🟩 arm64 Pass: 100%/8 | Total: 7h 31m | Avg: 56m 27s | Max: 57m 35s 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 12h 32m | Avg: 50m 09s | Max: 59m 43s | Hits: 36%/757 🟩 11.8 Pass: 100%/3 | Total: 3h 45m | Avg: 1h 15m | Max: 1h 18m 🟩 12.5 Pass: 100%/4 | Total: 4h 30m | Avg: 1h 07m | Max: 1h 14m 🟩 12.6 Pass: 100%/88 | Total: 3d 05h | Avg: 52m 58s | Max: 1h 11m | Hits: 36%/2271 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 4h 09m | Avg: 1h 02m | Max: 1h 03m 🟩 nvcc11.1 Pass: 100%/15 | Total: 12h 32m | Avg: 50m 09s | Max: 59m 43s | Hits: 36%/757 🟩 nvcc11.8 Pass: 100%/3 | Total: 3h 45m | Avg: 1h 15m | Max: 1h 18m 🟩 nvcc12.5 Pass: 100%/4 | Total: 4h 30m | Avg: 1h 07m | Max: 1h 14m 🟩 nvcc12.6 Pass: 100%/84 | Total: 3d 01h | Avg: 52m 31s | Max: 1h 11m | Hits: 36%/2271 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 4h 09m | Avg: 1h 02m | Max: 1h 03m 🟩 nvcc Pass: 100%/106 | Total: 3d 22h | Avg: 53m 24s | Max: 1h 18m | Hits: 36%/3028 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 5h 17m | Avg: 52m 55s | Max: 58m 37s 🟩 Clang10 Pass: 100%/3 | Total: 2h 47m | Avg: 55m 45s | Max: 58m 04s 🟩 Clang11 Pass: 100%/4 | Total: 3h 43m | Avg: 55m 55s | Max: 1h 00m 🟩 Clang12 Pass: 100%/4 | Total: 3h 51m | Avg: 57m 55s | Max: 1h 01m 🟩 Clang13 Pass: 100%/4 | Total: 3h 36m | Avg: 54m 02s | Max: 58m 47s 🟩 Clang14 Pass: 100%/4 | Total: 3h 37m | Avg: 54m 18s | Max: 58m 45s 🟩 Clang15 Pass: 100%/4 | Total: 3h 44m | Avg: 56m 11s | Max: 1h 00m 🟩 Clang16 Pass: 100%/4 | Total: 3h 45m | Avg: 56m 24s | Max: 59m 03s 🟩 Clang17 Pass: 100%/4 | Total: 3h 39m | Avg: 54m 55s | Max: 58m 14s 🟩 Clang18 Pass: 100%/11 | Total: 9h 55m | Avg: 54m 08s | Max: 1h 03m 🟩 GCC6 Pass: 100%/2 | Total: 1h 36m | Avg: 48m 29s | Max: 49m 48s 🟩 GCC7 Pass: 100%/6 | Total: 5h 16m | Avg: 52m 40s | Max: 58m 11s 🟩 GCC8 Pass: 100%/6 | Total: 5h 24m | Avg: 54m 05s | Max: 58m 30s 🟩 GCC9 Pass: 100%/6 | Total: 5h 09m | Avg: 51m 38s | Max: 55m 37s 🟩 GCC10 Pass: 100%/4 | Total: 4h 00m | Avg: 1h 00m | Max: 1h 02m 🟩 GCC11 Pass: 100%/7 | Total: 7h 28m | Avg: 1h 04m | Max: 1h 18m 🟩 GCC12 Pass: 100%/4 | Total: 3h 38m | Avg: 54m 36s | Max: 58m 17s 🟩 GCC13 Pass: 100%/16 | Total: 9h 55m | Avg: 37m 14s | Max: 57m 51s 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 3h 02m | Avg: 1h 00m | Max: 1h 03m 🟩 MSVC14.16 Pass: 100%/1 | Total: 59m 43s | Avg: 59m 43s | Max: 59m 43s | Hits: 36%/757 🟩 MSVC14.29 Pass: 100%/2 | Total: 2h 16m | Avg: 1h 08m | Max: 1h 11m | Hits: 36%/1514 🟩 MSVC14.39 Pass: 100%/1 | Total: 1h 10m | Avg: 1h 10m | Max: 1h 10m | Hits: 36%/757 🟩 NVHPC24.7 Pass: 100%/4 | Total: 4h 30m | Avg: 1h 07m | Max: 1h 14m 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 19h | Avg: 54m 59s | Max: 1h 03m 🟩 GCC Pass: 100%/51 | Total: 1d 18h | Avg: 50m 01s | Max: 1h 18m 🟩 Intel Pass: 100%/3 | Total: 3h 02m | Avg: 1h 00m | Max: 1h 03m 🟩 MSVC Pass: 100%/4 | Total: 4h 26m | Avg: 1h 06m | Max: 1h 11m | Hits: 36%/3028 🟩 NVHPC Pass: 100%/4 | Total: 4h 30m | Avg: 1h 07m | Max: 1h 14m 🟩 gpu 🟩 v100 Pass: 100%/110 | Total: 4d 02h | Avg: 53m 43s | Max: 1h 18m | Hits: 36%/3028 🟩 jobs 🟩 Build Pass: 100%/102 | Total: 3d 22h | Avg: 55m 43s | Max: 1h 18m | Hits: 36%/3028 🟩 DeviceLaunch Pass: 100%/1 | Total: 21m 52s | Avg: 21m 52s | Max: 21m 52s 🟩 GraphCapture Pass: 100%/1 | Total: 15m 26s | Avg: 15m 26s | Max: 15m 26s 🟩 HostLaunch Pass: 100%/3 | Total: 1h 09m | Avg: 23m 14s | Max: 26m 15s 🟩 TestGPU Pass: 100%/3 | Total: 1h 59m | Avg: 39m 44s | Max: 40m 29s 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 3h 45m | Avg: 1h 15m | Max: 1h 18m 🟩 90a Pass: 100%/4 | Total: 1h 37m | Avg: 24m 22s | Max: 25m 25s 🟩 std 🟩 11 Pass: 100%/30 | Total: 1d 02h | Avg: 52m 02s | Max: 1h 12m 🟩 14 Pass: 100%/29 | Total: 1d 02h | Avg: 55m 23s | Max: 1h 18m | Hits: 36%/1514 🟩 17 Pass: 100%/27 | Total: 1d 01h | Avg: 57m 21s | Max: 1h 15m | Hits: 36%/757 🟩 20 Pass: 100%/24 | Total: 19h 53m | Avg: 49m 44s | Max: 1h 11m | Hits: 36%/757 ```
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 ctk 🟩 12.6 Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 cxx 🟩 GCC13 Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 cxx_family 🟩 GCC Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 gpu 🟩 v100 Pass: 100%/2 | Total: 10m 05s | Avg: 5m 02s | Max: 7m 41s 🟩 jobs 🟩 Build Pass: 100%/1 | Total: 2m 24s | Avg: 2m 24s | Max: 2m 24s 🟩 Test Pass: 100%/1 | Total: 7m 41s | Avg: 7m 41s | Max: 7m 41s ```
  • 🟩 python: Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 ctk 🟩 12.6 Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 cxx 🟩 GCC13 Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 cxx_family 🟩 GCC Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 gpu 🟩 v100 Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s 🟩 jobs 🟩 Test Pass: 100%/1 | Total: 16m 41s | Avg: 16m 41s | Max: 16m 41s ```

👃 Inspect Changes

### Modifications in project? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | | Thrust | | CUDA Experimental | | python | | CCCL C Parallel Library | | Catch2Helper ### Modifications in project or dependencies? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | +/- | Thrust | | CUDA Experimental | +/- | python | +/- | CCCL C Parallel Library | +/- | Catch2Helper

🏃‍ Runner counts (total jobs: 224)

| # | Runner |------|------ | 185 | `linux-amd64-cpu16` | 16 | `linux-arm64-cpu16` | 14 | `linux-amd64-gpu-v100-latest-1` | 9 | `windows-amd64-cpu16`
bernhardmgruber commented 6 days ago

Thx for reporting the benchmarks. Looks good except for Reduce Max on I8, I32, 2^28. A 14% slowdown is unfortunately below @gevtushenko's rule of "no regressions of more than 2% compared to previous implementation on 2^24+ problem sizes". Could you please investigate the cause of this regression? We should try to fix this.

fbusato commented 3 days ago

14% slowdown is too large. Let me see if I can fix it

fbusato commented 3 days ago

@bernhardmgruber (and @gevtushenko) All routines that show regressions here have been "artificially" improved by the following problem. Non-standard binary operators were recognized as operators that can be optimized as binary tree reduction. In fact, the code can't optimize these operators because it has no knowledge of their structures. In summary, these regressions cannot be avoided for user-provided binary operators

github-actions[bot] commented 3 days ago
🟩 CI finished in 3h 55m: Pass: 100%/224 | Total: 7d 00h | Avg: 45m 13s | Max: 1h 20m | Hits: 15%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 2d 17h | Avg: 35m 16s | Max: 1h 20m | Hits: 20%/9260

    ``` 🟩 cmake_options 🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2 | Total: 50m 31s | Avg: 25m 15s | Max: 34m 58s 🟩 cpu 🟩 amd64 Pass: 100%/103 | Total: 2d 12h | Avg: 35m 31s | Max: 1h 20m | Hits: 20%/9260 🟩 arm64 Pass: 100%/8 | Total: 4h 16m | Avg: 32m 02s | Max: 37m 51s 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 8h 22m | Avg: 33m 28s | Max: 1h 14m | Hits: 0%/1852 🟩 11.8 Pass: 100%/3 | Total: 2h 06m | Avg: 42m 05s | Max: 45m 32s 🟩 12.5 Pass: 100%/4 | Total: 4h 53m | Avg: 1h 13m | Max: 1h 15m 🟩 12.6 Pass: 100%/89 | Total: 2d 01h | Avg: 33m 37s | Max: 1h 20m | Hits: 25%/7408 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 1h 48m | Avg: 27m 07s | Max: 28m 46s 🟩 nvcc11.1 Pass: 100%/15 | Total: 8h 22m | Avg: 33m 28s | Max: 1h 14m | Hits: 0%/1852 🟩 nvcc11.8 Pass: 100%/3 | Total: 2h 06m | Avg: 42m 05s | Max: 45m 32s 🟩 nvcc12.5 Pass: 100%/4 | Total: 4h 53m | Avg: 1h 13m | Max: 1h 15m 🟩 nvcc12.6 Pass: 100%/85 | Total: 2d 00h | Avg: 33m 56s | Max: 1h 20m | Hits: 25%/7408 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 1h 48m | Avg: 27m 07s | Max: 28m 46s 🟩 nvcc Pass: 100%/107 | Total: 2d 15h | Avg: 35m 34s | Max: 1h 20m | Hits: 20%/9260 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 3h 15m | Avg: 32m 36s | Max: 39m 35s 🟩 Clang10 Pass: 100%/3 | Total: 1h 52m | Avg: 37m 20s | Max: 41m 33s 🟩 Clang11 Pass: 100%/4 | Total: 2h 15m | Avg: 33m 52s | Max: 37m 44s 🟩 Clang12 Pass: 100%/4 | Total: 2h 15m | Avg: 33m 49s | Max: 37m 01s 🟩 Clang13 Pass: 100%/4 | Total: 2h 14m | Avg: 33m 35s | Max: 35m 50s 🟩 Clang14 Pass: 100%/4 | Total: 2h 16m | Avg: 34m 02s | Max: 37m 13s 🟩 Clang15 Pass: 100%/4 | Total: 2h 15m | Avg: 33m 54s | Max: 36m 38s 🟩 Clang16 Pass: 100%/4 | Total: 2h 20m | Avg: 35m 06s | Max: 39m 23s 🟩 Clang17 Pass: 100%/4 | Total: 2h 20m | Avg: 35m 08s | Max: 40m 14s 🟩 Clang18 Pass: 100%/11 | Total: 4h 51m | Avg: 26m 28s | Max: 37m 37s 🟩 GCC6 Pass: 100%/2 | Total: 59m 40s | Avg: 29m 50s | Max: 34m 46s 🟩 GCC7 Pass: 100%/6 | Total: 3h 12m | Avg: 32m 08s | Max: 39m 17s 🟩 GCC8 Pass: 100%/6 | Total: 3h 08m | Avg: 31m 25s | Max: 35m 54s 🟩 GCC9 Pass: 100%/6 | Total: 3h 19m | Avg: 33m 16s | Max: 39m 14s 🟩 GCC10 Pass: 100%/4 | Total: 2h 20m | Avg: 35m 05s | Max: 40m 59s 🟩 GCC11 Pass: 100%/7 | Total: 4h 24m | Avg: 37m 49s | Max: 45m 32s 🟩 GCC12 Pass: 100%/4 | Total: 2h 34m | Avg: 38m 32s | Max: 46m 31s 🟩 GCC13 Pass: 100%/16 | Total: 6h 29m | Avg: 24m 21s | Max: 45m 21s 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 2h 33m | Avg: 51m 13s | Max: 55m 39s 🟩 MSVC14.16 Pass: 100%/1 | Total: 1h 14m | Avg: 1h 14m | Max: 1h 14m | Hits: 0%/1852 🟩 MSVC14.29 Pass: 100%/2 | Total: 2h 21m | Avg: 1h 10m | Max: 1h 17m | Hits: 0%/3704 🟩 MSVC14.39 Pass: 100%/2 | Total: 1h 44m | Avg: 52m 25s | Max: 1h 20m | Hits: 49%/3704 🟩 NVHPC24.7 Pass: 100%/4 | Total: 4h 53m | Avg: 1h 13m | Max: 1h 15m 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 01h | Avg: 32m 26s | Max: 41m 33s 🟩 GCC Pass: 100%/51 | Total: 1d 02h | Avg: 31m 10s | Max: 46m 31s 🟩 Intel Pass: 100%/3 | Total: 2h 33m | Avg: 51m 13s | Max: 55m 39s 🟩 MSVC Pass: 100%/5 | Total: 5h 21m | Avg: 1h 04m | Max: 1h 20m | Hits: 20%/9260 🟩 NVHPC Pass: 100%/4 | Total: 4h 53m | Avg: 1h 13m | Max: 1h 15m 🟩 gpu 🟩 v100 Pass: 100%/111 | Total: 2d 17h | Avg: 35m 16s | Max: 1h 20m | Hits: 20%/9260 🟩 jobs 🟩 Build Pass: 100%/103 | Total: 2d 15h | Avg: 36m 58s | Max: 1h 20m | Hits: 0%/7408 🟩 TestCPU Pass: 100%/4 | Total: 45m 04s | Avg: 11m 16s | Max: 24m 02s | Hits: 99%/1852 🟩 TestGPU Pass: 100%/4 | Total: 1h 00m | Avg: 15m 12s | Max: 16m 18s 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 2h 06m | Avg: 42m 05s | Max: 45m 32s 🟩 90a Pass: 100%/4 | Total: 1h 24m | Avg: 21m 02s | Max: 25m 36s 🟩 std 🟩 11 Pass: 100%/30 | Total: 14h 30m | Avg: 29m 00s | Max: 1h 09m 🟩 14 Pass: 100%/29 | Total: 18h 48m | Avg: 38m 55s | Max: 1h 15m | Hits: 0%/3704 🟩 17 Pass: 100%/27 | Total: 17h 33m | Avg: 39m 00s | Max: 1h 17m | Hits: 0%/1852 🟩 20 Pass: 100%/23 | Total: 13h 32m | Avg: 35m 18s | Max: 1h 20m | Hits: 49%/3704 ```
  • 🟩 cub: Pass: 100%/110 | Total: 4d 07h | Avg: 56m 16s | Max: 1h 19m | Hits: 0%/3028

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/102 | Total: 3d 23h | Avg: 55m 54s | Max: 1h 19m | Hits: 0%/3028 🟩 arm64 Pass: 100%/8 | Total: 8h 07m | Avg: 1h 00m | Max: 1h 07m 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 12h 50m | Avg: 51m 23s | Max: 58m 56s | Hits: 0%/757 🟩 11.8 Pass: 100%/3 | Total: 3h 47m | Avg: 1h 15m | Max: 1h 19m 🟩 12.5 Pass: 100%/4 | Total: 4h 36m | Avg: 1h 09m | Max: 1h 12m 🟩 12.6 Pass: 100%/88 | Total: 3d 09h | Avg: 55m 52s | Max: 1h 19m | Hits: 0%/2271 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 4h 06m | Avg: 1h 01m | Max: 1h 02m 🟩 nvcc11.1 Pass: 100%/15 | Total: 12h 50m | Avg: 51m 23s | Max: 58m 56s | Hits: 0%/757 🟩 nvcc11.8 Pass: 100%/3 | Total: 3h 47m | Avg: 1h 15m | Max: 1h 19m 🟩 nvcc12.5 Pass: 100%/4 | Total: 4h 36m | Avg: 1h 09m | Max: 1h 12m 🟩 nvcc12.6 Pass: 100%/84 | Total: 3d 05h | Avg: 55m 35s | Max: 1h 19m | Hits: 0%/2271 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 4h 06m | Avg: 1h 01m | Max: 1h 02m 🟩 nvcc Pass: 100%/106 | Total: 4d 03h | Avg: 56m 04s | Max: 1h 19m | Hits: 0%/3028 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 5h 20m | Avg: 53m 29s | Max: 1h 01m 🟩 Clang10 Pass: 100%/3 | Total: 3h 14m | Avg: 1h 04m | Max: 1h 15m 🟩 Clang11 Pass: 100%/4 | Total: 3h 55m | Avg: 58m 53s | Max: 1h 00m 🟩 Clang12 Pass: 100%/4 | Total: 4h 04m | Avg: 1h 01m | Max: 1h 16m 🟩 Clang13 Pass: 100%/4 | Total: 3h 44m | Avg: 56m 03s | Max: 57m 41s 🟩 Clang14 Pass: 100%/4 | Total: 3h 49m | Avg: 57m 15s | Max: 1h 01m 🟩 Clang15 Pass: 100%/4 | Total: 3h 48m | Avg: 57m 00s | Max: 59m 17s 🟩 Clang16 Pass: 100%/4 | Total: 3h 44m | Avg: 56m 06s | Max: 59m 22s 🟩 Clang17 Pass: 100%/4 | Total: 4h 03m | Avg: 1h 00m | Max: 1h 04m 🟩 Clang18 Pass: 100%/11 | Total: 9h 41m | Avg: 52m 53s | Max: 1h 04m 🟩 GCC6 Pass: 100%/2 | Total: 1h 41m | Avg: 50m 38s | Max: 51m 24s 🟩 GCC7 Pass: 100%/6 | Total: 5h 31m | Avg: 55m 18s | Max: 59m 59s 🟩 GCC8 Pass: 100%/6 | Total: 5h 24m | Avg: 54m 00s | Max: 58m 23s 🟩 GCC9 Pass: 100%/6 | Total: 5h 27m | Avg: 54m 36s | Max: 1h 02m 🟩 GCC10 Pass: 100%/4 | Total: 4h 17m | Avg: 1h 04m | Max: 1h 19m 🟩 GCC11 Pass: 100%/7 | Total: 7h 43m | Avg: 1h 06m | Max: 1h 19m 🟩 GCC12 Pass: 100%/4 | Total: 3h 49m | Avg: 57m 17s | Max: 1h 00m 🟩 GCC13 Pass: 100%/16 | Total: 11h 44m | Avg: 44m 00s | Max: 1h 11m 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 3h 10m | Avg: 1h 03m | Max: 1h 08m 🟩 MSVC14.16 Pass: 100%/1 | Total: 58m 56s | Avg: 58m 56s | Max: 58m 56s | Hits: 0%/757 🟩 MSVC14.29 Pass: 100%/2 | Total: 2h 08m | Avg: 1h 04m | Max: 1h 05m | Hits: 0%/1514 🟩 MSVC14.39 Pass: 100%/1 | Total: 1h 10m | Avg: 1h 10m | Max: 1h 10m | Hits: 0%/757 🟩 NVHPC24.7 Pass: 100%/4 | Total: 4h 36m | Avg: 1h 09m | Max: 1h 12m 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 21h | Avg: 56m 48s | Max: 1h 16m 🟩 GCC Pass: 100%/51 | Total: 1d 21h | Avg: 53m 42s | Max: 1h 19m 🟩 Intel Pass: 100%/3 | Total: 3h 10m | Avg: 1h 03m | Max: 1h 08m 🟩 MSVC Pass: 100%/4 | Total: 4h 17m | Avg: 1h 04m | Max: 1h 10m | Hits: 0%/3028 🟩 NVHPC Pass: 100%/4 | Total: 4h 36m | Avg: 1h 09m | Max: 1h 12m 🟩 gpu 🟩 v100 Pass: 100%/110 | Total: 4d 07h | Avg: 56m 16s | Max: 1h 19m | Hits: 0%/3028 🟩 jobs 🟩 Build Pass: 100%/102 | Total: 4d 02h | Avg: 58m 03s | Max: 1h 19m | Hits: 0%/3028 🟩 DeviceLaunch Pass: 100%/1 | Total: 19m 55s | Avg: 19m 55s | Max: 19m 55s 🟩 GraphCapture Pass: 100%/1 | Total: 1h 06m | Avg: 1h 06m | Max: 1h 06m 🟩 HostLaunch Pass: 100%/3 | Total: 1h 06m | Avg: 22m 11s | Max: 28m 31s 🟩 TestGPU Pass: 100%/3 | Total: 1h 55m | Avg: 38m 26s | Max: 1h 11m 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 3h 47m | Avg: 1h 15m | Max: 1h 19m 🟩 90a Pass: 100%/4 | Total: 1h 45m | Avg: 26m 17s | Max: 29m 28s 🟩 std 🟩 11 Pass: 100%/30 | Total: 1d 03h | Avg: 54m 05s | Max: 1h 13m 🟩 14 Pass: 100%/29 | Total: 1d 04h | Avg: 58m 28s | Max: 1h 19m | Hits: 0%/1514 🟩 17 Pass: 100%/27 | Total: 1d 01h | Avg: 57m 25s | Max: 1h 19m | Hits: 0%/757 🟩 20 Pass: 100%/24 | Total: 22h 01m | Avg: 55m 03s | Max: 1h 16m | Hits: 0%/757 ```
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 ctk 🟩 12.6 Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 cxx 🟩 GCC13 Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 cxx_family 🟩 GCC Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 gpu 🟩 v100 Pass: 100%/2 | Total: 9m 49s | Avg: 4m 54s | Max: 7m 37s 🟩 jobs 🟩 Build Pass: 100%/1 | Total: 2m 12s | Avg: 2m 12s | Max: 2m 12s 🟩 Test Pass: 100%/1 | Total: 7m 37s | Avg: 7m 37s | Max: 7m 37s ```
  • 🟩 python: Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 ctk 🟩 12.6 Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 cxx 🟩 GCC13 Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 cxx_family 🟩 GCC Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 gpu 🟩 v100 Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s 🟩 jobs 🟩 Test Pass: 100%/1 | Total: 15m 55s | Avg: 15m 55s | Max: 15m 55s ```

👃 Inspect Changes

### Modifications in project? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | | Thrust | | CUDA Experimental | | python | | CCCL C Parallel Library | | Catch2Helper ### Modifications in project or dependencies? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | +/- | Thrust | | CUDA Experimental | +/- | python | +/- | CCCL C Parallel Library | +/- | Catch2Helper

🏃‍ Runner counts (total jobs: 224)

| # | Runner |------|------ | 185 | `linux-amd64-cpu16` | 16 | `linux-arm64-cpu16` | 14 | `linux-amd64-gpu-v100-latest-1` | 9 | `windows-amd64-cpu16`
github-actions[bot] commented 2 days ago
🟩 CI finished in 2h 54m: Pass: 100%/224 | Total: 6d 15h | Avg: 42m 39s | Max: 1h 17m | Hits: 60%/12288
  • 🟩 thrust: Pass: 100%/111 | Total: 2d 13h | Avg: 33m 04s | Max: 1h 05m | Hits: 70%/9260

    ``` 🟩 cmake_options 🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2 | Total: 1h 11m | Avg: 35m 37s | Max: 38m 02s 🟩 cpu 🟩 amd64 Pass: 100%/103 | Total: 2d 09h | Avg: 33m 16s | Max: 1h 05m | Hits: 70%/9260 🟩 arm64 Pass: 100%/8 | Total: 4h 02m | Avg: 30m 22s | Max: 35m 51s 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 8h 03m | Avg: 32m 15s | Max: 56m 28s | Hits: 62%/1852 🟩 11.8 Pass: 100%/3 | Total: 2h 00m | Avg: 40m 06s | Max: 44m 05s 🟩 12.5 Pass: 100%/4 | Total: 3h 38m | Avg: 54m 42s | Max: 1h 05m 🟩 12.6 Pass: 100%/89 | Total: 1d 23h | Avg: 31m 59s | Max: 1h 00m | Hits: 71%/7408 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 1h 43m | Avg: 25m 59s | Max: 27m 25s 🟩 nvcc11.1 Pass: 100%/15 | Total: 8h 03m | Avg: 32m 15s | Max: 56m 28s | Hits: 62%/1852 🟩 nvcc11.8 Pass: 100%/3 | Total: 2h 00m | Avg: 40m 06s | Max: 44m 05s 🟩 nvcc12.5 Pass: 100%/4 | Total: 3h 38m | Avg: 54m 42s | Max: 1h 05m 🟩 nvcc12.6 Pass: 100%/85 | Total: 1d 21h | Avg: 32m 16s | Max: 1h 00m | Hits: 71%/7408 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 1h 43m | Avg: 25m 59s | Max: 27m 25s 🟩 nvcc Pass: 100%/107 | Total: 2d 11h | Avg: 33m 20s | Max: 1h 05m | Hits: 70%/9260 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 3h 03m | Avg: 30m 39s | Max: 33m 41s 🟩 Clang10 Pass: 100%/3 | Total: 1h 38m | Avg: 32m 53s | Max: 35m 02s 🟩 Clang11 Pass: 100%/4 | Total: 2h 09m | Avg: 32m 21s | Max: 34m 34s 🟩 Clang12 Pass: 100%/4 | Total: 2h 11m | Avg: 32m 51s | Max: 37m 05s 🟩 Clang13 Pass: 100%/4 | Total: 2h 06m | Avg: 31m 38s | Max: 33m 47s 🟩 Clang14 Pass: 100%/4 | Total: 2h 04m | Avg: 31m 02s | Max: 33m 49s 🟩 Clang15 Pass: 100%/4 | Total: 2h 07m | Avg: 31m 48s | Max: 33m 28s 🟩 Clang16 Pass: 100%/4 | Total: 2h 07m | Avg: 31m 54s | Max: 33m 57s 🟩 Clang17 Pass: 100%/4 | Total: 2h 15m | Avg: 33m 58s | Max: 36m 24s 🟩 Clang18 Pass: 100%/11 | Total: 4h 29m | Avg: 24m 31s | Max: 33m 06s 🟩 GCC6 Pass: 100%/2 | Total: 1h 08m | Avg: 34m 04s | Max: 41m 22s 🟩 GCC7 Pass: 100%/6 | Total: 3h 15m | Avg: 32m 30s | Max: 38m 26s 🟩 GCC8 Pass: 100%/6 | Total: 3h 01m | Avg: 30m 16s | Max: 34m 46s 🟩 GCC9 Pass: 100%/6 | Total: 3h 26m | Avg: 34m 22s | Max: 45m 00s 🟩 GCC10 Pass: 100%/4 | Total: 2h 31m | Avg: 37m 59s | Max: 45m 00s 🟩 GCC11 Pass: 100%/7 | Total: 4h 20m | Avg: 37m 10s | Max: 44m 05s 🟩 GCC12 Pass: 100%/4 | Total: 2h 39m | Avg: 39m 48s | Max: 48m 57s 🟩 GCC13 Pass: 100%/16 | Total: 6h 39m | Avg: 24m 59s | Max: 41m 32s 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 1h 59m | Avg: 39m 55s | Max: 42m 46s 🟩 MSVC14.16 Pass: 100%/1 | Total: 56m 28s | Avg: 56m 28s | Max: 56m 28s | Hits: 62%/1852 🟩 MSVC14.29 Pass: 100%/2 | Total: 1h 54m | Avg: 57m 02s | Max: 57m 57s | Hits: 62%/3704 🟩 MSVC14.39 Pass: 100%/2 | Total: 1h 24m | Avg: 42m 25s | Max: 1h 00m | Hits: 81%/3704 🟩 NVHPC24.7 Pass: 100%/4 | Total: 3h 38m | Avg: 54m 42s | Max: 1h 05m 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 00h | Avg: 30m 18s | Max: 37m 05s 🟩 GCC Pass: 100%/51 | Total: 1d 03h | Avg: 31m 48s | Max: 48m 57s 🟩 Intel Pass: 100%/3 | Total: 1h 59m | Avg: 39m 55s | Max: 42m 46s 🟩 MSVC Pass: 100%/5 | Total: 4h 15m | Avg: 51m 04s | Max: 1h 00m | Hits: 70%/9260 🟩 NVHPC Pass: 100%/4 | Total: 3h 38m | Avg: 54m 42s | Max: 1h 05m 🟩 gpu 🟩 v100 Pass: 100%/111 | Total: 2d 13h | Avg: 33m 04s | Max: 1h 05m | Hits: 70%/9260 🟩 jobs 🟩 Build Pass: 100%/103 | Total: 2d 11h | Avg: 34m 25s | Max: 1h 05m | Hits: 62%/7408 🟩 TestCPU Pass: 100%/4 | Total: 45m 30s | Avg: 11m 22s | Max: 24m 24s | Hits: 99%/1852 🟩 TestGPU Pass: 100%/4 | Total: 1h 19m | Avg: 19m 46s | Max: 38m 02s 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 2h 00m | Avg: 40m 06s | Max: 44m 05s 🟩 90a Pass: 100%/4 | Total: 1h 26m | Avg: 21m 37s | Max: 24m 05s 🟩 std 🟩 11 Pass: 100%/30 | Total: 13h 46m | Avg: 27m 32s | Max: 43m 56s 🟩 14 Pass: 100%/29 | Total: 17h 33m | Avg: 36m 20s | Max: 56m 28s | Hits: 62%/3704 🟩 17 Pass: 100%/27 | Total: 16h 36m | Avg: 36m 54s | Max: 1h 05m | Hits: 62%/1852 🟩 20 Pass: 100%/23 | Total: 12h 03m | Avg: 31m 26s | Max: 1h 00m | Hits: 81%/3704 ```
  • 🟩 cub: Pass: 100%/110 | Total: 4d 01h | Avg: 53m 16s | Max: 1h 17m | Hits: 31%/3028

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/102 | Total: 3d 17h | Avg: 52m 45s | Max: 1h 17m | Hits: 31%/3028 🟩 arm64 Pass: 100%/8 | Total: 7h 58m | Avg: 59m 50s | Max: 1h 02m 🟩 ctk 🟩 11.1 Pass: 100%/15 | Total: 12h 35m | Avg: 50m 23s | Max: 1h 01m | Hits: 31%/757 🟩 11.8 Pass: 100%/3 | Total: 3h 43m | Avg: 1h 14m | Max: 1h 17m 🟩 12.5 Pass: 100%/4 | Total: 4h 21m | Avg: 1h 05m | Max: 1h 08m 🟩 12.6 Pass: 100%/88 | Total: 3d 04h | Avg: 52m 29s | Max: 1h 10m | Hits: 31%/2271 🟩 cudacxx 🟩 ClangCUDA18 Pass: 100%/4 | Total: 3h 52m | Avg: 58m 00s | Max: 59m 26s 🟩 nvcc11.1 Pass: 100%/15 | Total: 12h 35m | Avg: 50m 23s | Max: 1h 01m | Hits: 31%/757 🟩 nvcc11.8 Pass: 100%/3 | Total: 3h 43m | Avg: 1h 14m | Max: 1h 17m 🟩 nvcc12.5 Pass: 100%/4 | Total: 4h 21m | Avg: 1h 05m | Max: 1h 08m 🟩 nvcc12.6 Pass: 100%/84 | Total: 3d 01h | Avg: 52m 14s | Max: 1h 10m | Hits: 31%/2271 🟩 cudacxx_family 🟩 ClangCUDA Pass: 100%/4 | Total: 3h 52m | Avg: 58m 00s | Max: 59m 26s 🟩 nvcc Pass: 100%/106 | Total: 3d 21h | Avg: 53m 05s | Max: 1h 17m | Hits: 31%/3028 🟩 cxx 🟩 Clang9 Pass: 100%/6 | Total: 5h 09m | Avg: 51m 30s | Max: 53m 44s 🟩 Clang10 Pass: 100%/3 | Total: 2h 49m | Avg: 56m 34s | Max: 1h 02m 🟩 Clang11 Pass: 100%/4 | Total: 3h 53m | Avg: 58m 26s | Max: 1h 00m 🟩 Clang12 Pass: 100%/4 | Total: 3h 40m | Avg: 55m 04s | Max: 59m 17s 🟩 Clang13 Pass: 100%/4 | Total: 3h 39m | Avg: 54m 56s | Max: 57m 21s 🟩 Clang14 Pass: 100%/4 | Total: 3h 45m | Avg: 56m 19s | Max: 59m 59s 🟩 Clang15 Pass: 100%/4 | Total: 3h 41m | Avg: 55m 21s | Max: 1h 01m 🟩 Clang16 Pass: 100%/4 | Total: 3h 55m | Avg: 58m 51s | Max: 1h 01m 🟩 Clang17 Pass: 100%/4 | Total: 3h 44m | Avg: 56m 00s | Max: 59m 12s 🟩 Clang18 Pass: 100%/11 | Total: 9h 25m | Avg: 51m 24s | Max: 1h 02m 🟩 GCC6 Pass: 100%/2 | Total: 1h 38m | Avg: 49m 26s | Max: 49m 30s 🟩 GCC7 Pass: 100%/6 | Total: 5h 06m | Avg: 51m 09s | Max: 53m 15s 🟩 GCC8 Pass: 100%/6 | Total: 5h 26m | Avg: 54m 20s | Max: 59m 40s 🟩 GCC9 Pass: 100%/6 | Total: 5h 18m | Avg: 53m 04s | Max: 59m 43s 🟩 GCC10 Pass: 100%/4 | Total: 3h 46m | Avg: 56m 42s | Max: 59m 57s 🟩 GCC11 Pass: 100%/7 | Total: 7h 20m | Avg: 1h 02m | Max: 1h 17m 🟩 GCC12 Pass: 100%/4 | Total: 3h 43m | Avg: 55m 51s | Max: 1h 01m 🟩 GCC13 Pass: 100%/16 | Total: 9h 51m | Avg: 36m 58s | Max: 1h 02m 🟩 Intel2023.2.0 Pass: 100%/3 | Total: 3h 06m | Avg: 1h 02m | Max: 1h 05m 🟩 MSVC14.16 Pass: 100%/1 | Total: 1h 01m | Avg: 1h 01m | Max: 1h 01m | Hits: 31%/757 🟩 MSVC14.29 Pass: 100%/2 | Total: 2h 11m | Avg: 1h 05m | Max: 1h 10m | Hits: 31%/1514 🟩 MSVC14.39 Pass: 100%/1 | Total: 1h 04m | Avg: 1h 04m | Max: 1h 04m | Hits: 31%/757 🟩 NVHPC24.7 Pass: 100%/4 | Total: 4h 21m | Avg: 1h 05m | Max: 1h 08m 🟩 cxx_family 🟩 Clang Pass: 100%/48 | Total: 1d 19h | Avg: 54m 40s | Max: 1h 02m 🟩 GCC Pass: 100%/51 | Total: 1d 18h | Avg: 49m 39s | Max: 1h 17m 🟩 Intel Pass: 100%/3 | Total: 3h 06m | Avg: 1h 02m | Max: 1h 05m 🟩 MSVC Pass: 100%/4 | Total: 4h 16m | Avg: 1h 04m | Max: 1h 10m | Hits: 31%/3028 🟩 NVHPC Pass: 100%/4 | Total: 4h 21m | Avg: 1h 05m | Max: 1h 08m 🟩 gpu 🟩 v100 Pass: 100%/110 | Total: 4d 01h | Avg: 53m 16s | Max: 1h 17m | Hits: 31%/3028 🟩 jobs 🟩 Build Pass: 100%/102 | Total: 3d 22h | Avg: 55m 46s | Max: 1h 17m | Hits: 31%/3028 🟩 DeviceLaunch Pass: 100%/1 | Total: 32m 32s | Avg: 32m 32s | Max: 32m 32s 🟩 GraphCapture Pass: 100%/1 | Total: 16m 28s | Avg: 16m 28s | Max: 16m 28s 🟩 HostLaunch Pass: 100%/3 | Total: 54m 28s | Avg: 18m 09s | Max: 19m 08s 🟩 TestGPU Pass: 100%/3 | Total: 1h 08m | Avg: 22m 44s | Max: 25m 35s 🟩 sm 🟩 60;70;80;90 Pass: 100%/3 | Total: 3h 43m | Avg: 1h 14m | Max: 1h 17m 🟩 90a Pass: 100%/4 | Total: 1h 36m | Avg: 24m 14s | Max: 26m 10s 🟩 std 🟩 11 Pass: 100%/30 | Total: 1d 02h | Avg: 53m 12s | Max: 1h 15m 🟩 14 Pass: 100%/29 | Total: 1d 02h | Avg: 54m 58s | Max: 1h 10m | Hits: 31%/1514 🟩 17 Pass: 100%/27 | Total: 1d 01h | Avg: 56m 33s | Max: 1h 17m | Hits: 31%/757 🟩 20 Pass: 100%/24 | Total: 19h 02m | Avg: 47m 37s | Max: 1h 04m | Hits: 31%/757 ```
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 ctk 🟩 12.6 Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 cxx 🟩 GCC13 Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 cxx_family 🟩 GCC Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 gpu 🟩 v100 Pass: 100%/2 | Total: 9m 28s | Avg: 4m 44s | Max: 7m 08s 🟩 jobs 🟩 Build Pass: 100%/1 | Total: 2m 20s | Avg: 2m 20s | Max: 2m 20s 🟩 Test Pass: 100%/1 | Total: 7m 08s | Avg: 7m 08s | Max: 7m 08s ```
  • 🟩 python: Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s

    ``` 🟩 cpu 🟩 amd64 Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 ctk 🟩 12.6 Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 cudacxx 🟩 nvcc12.6 Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 cudacxx_family 🟩 nvcc Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 cxx 🟩 GCC13 Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 cxx_family 🟩 GCC Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 gpu 🟩 v100 Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s 🟩 jobs 🟩 Test Pass: 100%/1 | Total: 15m 22s | Avg: 15m 22s | Max: 15m 22s ```

👃 Inspect Changes

### Modifications in project? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | | Thrust | | CUDA Experimental | | python | | CCCL C Parallel Library | | Catch2Helper ### Modifications in project or dependencies? | | Project |-----|--------- | | CCCL Infrastructure | | libcu++ | +/- | CUB | +/- | Thrust | | CUDA Experimental | +/- | python | +/- | CCCL C Parallel Library | +/- | Catch2Helper

🏃‍ Runner counts (total jobs: 224)

| # | Runner |------|------ | 185 | `linux-amd64-cpu16` | 16 | `linux-arm64-cpu16` | 14 | `linux-amd64-gpu-v100-latest-1` | 9 | `windows-amd64-cpu16`