Closed tuzz closed 1 year ago
Hi Chris,
thanks for your interest in.
I checked results one month ago approx. on :
Results unfortunately still confirms that performance is better when using SSE2 path rather than AVX2. However, the performance gap is just in all the yuv to rgb conversions (so just a subset of all the available conversions).
A deeper analysis reveal such conversions are still unoptimal so first task will be switch to an improved versions of such conversions which in turn may lead to remove the intentional fallback.
Regarding the use of AVX512 it should be first detected whether the conversions are memory or cpu bound, as if they result to be memory-bound AVX-512 adoption will be almost useless (unless there is some magic instruction in the set devoted to color conversion).
Here's the relevant data (conversions that results in a pessimization):
Windows
bgr /i444 4480x2498/25 time: [19.540 ms 20.401 ms 21.452 ms]
thrpt: [521.67 Mpx/s 548.55 Mpx/s 572.71 Mpx/s ]
time: [-0.5898% +3.8822% +9.3907%] (p = 0.13 > 0.05)
thrpt: [-8.5845% -3.7371% +0.5933%]
nv12/bgra 352x188 /18 time: [38.431 µs 38.586 µs 38.767 µs]
thrpt: [1.7070 Gpx/s 1.7150 Gpx/s 1.7219 Gpx/s ]
time: [+23.047% +23.617% +24.251%] (p = 0.00 < 0.05)
thrpt: [-19.518% -19.105% -18.730%]
nv12/bgra 512x256 /19 time: [74.357 µs 74.456 µs 74.559 µs]
thrpt: [1.7580 Gpx/s 1.7604 Gpx/s 1.7627 Gpx/s ]
time: [+18.471% +18.697% +18.931%] (p = 0.00 < 0.05)
thrpt: [-15.918% -15.752% -15.591%]
nv12/bgra 704x374 /20 time: [149.87 µs 150.72 µs 152.02 µs]
thrpt: [1.7320 Gpx/s 1.7470 Gpx/s 1.7568 Gpx/s ]
time: [+17.259% +18.189% +19.406%] (p = 0.00 < 0.05)
thrpt: [-16.252% -15.390% -14.719%]
nv12/bgra 992x530 /21 time: [317.04 µs 318.27 µs 319.61 µs]
thrpt: [1.6450 Gpx/s 1.6519 Gpx/s 1.6583 Gpx/s ]
time: [+3.2208% +4.7924% +6.4036%] (p = 0.00 < 0.05)
thrpt: [-6.0182% -4.5733% -3.1203%]
nv12/bgra 1376x764 /22 time: [680.16 µs 686.95 µs 694.13 µs]
thrpt: [1.5145 Gpx/s 1.5303 Gpx/s 1.5456 Gpx/s ]
time: [-0.7787% +0.8935% +2.5998%] (p = 0.31 > 0.05)
thrpt: [-2.5339% -0.8855% +0.7848%]
nv12/bgra 1952x1076/23 time: [1.5661 ms 1.5948 ms 1.6261 ms]
thrpt: [1.2916 Gpx/s 1.3170 Gpx/s 1.3411 Gpx/s ]
time: [+12.178% +14.485% +17.409%] (p = 0.00 < 0.05)
thrpt: [-14.828% -12.653% -10.856%]
nv12/bgra 2752x1526/24 time: [2.8180 ms 2.8374 ms 2.8575 ms]
thrpt: [1.4697 Gpx/s 1.4801 Gpx/s 1.4903 Gpx/s ]
time: [+7.2238% +8.4282% +9.6579%] (p = 0.00 < 0.05)
thrpt: [-8.8073% -7.7731% -6.7371%]
nv12/bgra 3872x2168/25 time: [5.3382 ms 5.3628 ms 5.3912 ms]
thrpt: [1.5571 Gpx/s 1.5653 Gpx/s 1.5725 Gpx/s ]
time: [+11.563% +12.332% +13.139%] (p = 0.00 < 0.05)
thrpt: [-11.613% -10.978% -10.365%]
nv12/bgra 5472x3068/26 time: [10.488 ms 10.521 ms 10.555 ms]
thrpt: [1.5905 Gpx/s 1.5957 Gpx/s 1.6007 Gpx/s ]
time: [+6.0331% +8.2843% +10.281%] (p = 0.00 < 0.05)
thrpt: [-9.3225% -7.6505% -5.6898%]
i420/bgra 352x188 /18 time: [42.680 µs 42.782 µs 42.887 µs]
thrpt: [1.5430 Gpx/s 1.5468 Gpx/s 1.5505 Gpx/s ]
time: [+35.618% +36.280% +36.874%] (p = 0.00 < 0.05)
thrpt: [-26.940% -26.622% -26.264%]
i420/bgra 512x256 /19 time: [83.652 µs 83.907 µs 84.260 µs]
thrpt: [1.5556 Gpx/s 1.5621 Gpx/s 1.5669 Gpx/s ]
time: [+35.030% +35.623% +36.268%] (p = 0.00 < 0.05)
thrpt: [-26.615% -26.266% -25.942%]
i420/bgra 704x374 /20 time: [164.84 µs 165.17 µs 165.52 µs]
thrpt: [1.5908 Gpx/s 1.5941 Gpx/s 1.5973 Gpx/s ]
time: [+28.840% +29.416% +29.984%] (p = 0.00 < 0.05)
thrpt: [-23.067% -22.730% -22.384%]
i420/bgra 992x530 /21 time: [346.65 µs 347.91 µs 349.44 µs]
thrpt: [1.5046 Gpx/s 1.5112 Gpx/s 1.5167 Gpx/s ]
time: [+32.793% +33.639% +34.472%] (p = 0.00 < 0.05)
thrpt: [-25.635% -25.172% -24.695%]
i420/bgra 1376x764 /22 time: [736.44 µs 740.97 µs 745.77 µs]
thrpt: [1.4096 Gpx/s 1.4188 Gpx/s 1.4275 Gpx/s ]
time: [+15.129% +17.069% +19.027%] (p = 0.00 < 0.05)
thrpt: [-15.985% -14.581% -13.141%]
i420/bgra 1952x1076/23 time: [1.5742 ms 1.5865 ms 1.5993 ms]
thrpt: [1.3133 Gpx/s 1.3239 Gpx/s 1.3343 Gpx/s ]
time: [+13.425% +14.526% +15.621%] (p = 0.00 < 0.05)
thrpt: [-13.511% -12.684% -11.836%]
i420/bgra 2752x1526/24 time: [3.0761 ms 3.0963 ms 3.1165 ms]
thrpt: [1.3475 Gpx/s 1.3563 Gpx/s 1.3652 Gpx/s ]
time: [+20.672% +21.912% +23.153%] (p = 0.00 < 0.05)
thrpt: [-18.800% -17.974% -17.131%]
i420/bgra 3872x2168/25 time: [5.9519 ms 5.9750 ms 5.9988 ms]
thrpt: [1.3994 Gpx/s 1.4049 Gpx/s 1.4104 Gpx/s ]
time: [+24.367% +25.269% +26.104%] (p = 0.00 < 0.05)
thrpt: [-20.700% -20.172% -19.593%]
i420/bgra 5472x3068/26 time: [11.700 ms 11.739 ms 11.783 ms]
thrpt: [1.4247 Gpx/s 1.4302 Gpx/s 1.4349 Gpx/s ]
time: [+23.294% +24.191% +25.020%] (p = 0.00 < 0.05)
thrpt: [-20.013% -19.479% -18.893%]
i444/bgra 352x188 /18 time: [47.871 µs 48.041 µs 48.328 µs]
thrpt: [1.3693 Gpx/s 1.3775 Gpx/s 1.3824 Gpx/s ]
time: [+5.9569% +6.5602% +7.3039%] (p = 0.00 < 0.05)
thrpt: [-6.8067% -6.1563% -5.6220%]
i444/bgra 512x256 /19 time: [95.078 µs 95.281 µs 95.532 µs]
thrpt: [1.3720 Gpx/s 1.3756 Gpx/s 1.3786 Gpx/s ]
time: [+5.6854% +6.5524% +7.2634%] (p = 0.00 < 0.05)
thrpt: [-6.7716% -6.1495% -5.3795%]
i444/bgra 704x374 /20 time: [191.23 µs 191.78 µs 192.36 µs]
thrpt: [1.3687 Gpx/s 1.3729 Gpx/s 1.3769 Gpx/s ]
time: [+7.2774% +7.8852% +8.4043%] (p = 0.00 < 0.05)
thrpt: [-7.7527% -7.3089% -6.7838%]
i444/bgra 992x530 /21 time: [390.25 µs 392.22 µs 394.46 µs]
thrpt: [1.3328 Gpx/s 1.3405 Gpx/s 1.3473 Gpx/s ]
time: [+8.7713% +9.4991% +10.234%] (p = 0.00 < 0.05)
thrpt: [-9.2839% -8.6751% -8.0640%]
i444/bgra 1376x764 /22 time: [785.61 µs 788.89 µs 792.16 µs]
thrpt: [1.3271 Gpx/s 1.3326 Gpx/s 1.3382 Gpx/s ]
time: [+4.5342% +5.2933% +6.0305%] (p = 0.00 < 0.05)
thrpt: [-5.6875% -5.0272% -4.3375%]
i444/bgra 1952x1076/23 time: [1.6170 ms 1.6221 ms 1.6271 ms]
thrpt: [1.2908 Gpx/s 1.2948 Gpx/s 1.2989 Gpx/s ]
time: [+3.6431% +4.9288% +5.9022%] (p = 0.00 < 0.05)
thrpt: [-5.5732% -4.6973% -3.5151%]
i444/bgra 2752x1526/24 time: [3.2354 ms 3.2476 ms 3.2608 ms]
thrpt: [1.2879 Gpx/s 1.2931 Gpx/s 1.2980 Gpx/s ]
time: [+4.6876% +5.5025% +6.3062%] (p = 0.00 < 0.05)
thrpt: [-5.9321% -5.2155% -4.4777%]
i444/bgra 3872x2168/25 time: [6.4314 ms 6.4544 ms 6.4790 ms]
thrpt: [1.2957 Gpx/s 1.3006 Gpx/s 1.3052 Gpx/s ]
time: [+3.9724% +4.8604% +5.7326%] (p = 0.00 < 0.05)
thrpt: [-5.4218% -4.6351% -3.8206%]
i444/bgra 5472x3068/26 time: [12.903 ms 12.950 ms 13.005 ms]
thrpt: [1.2909 Gpx/s 1.2964 Gpx/s 1.3011 Gpx/s ]
time: [+6.0281% +6.4985% +6.9932%] (p = 0.00 < 0.05)
thrpt: [-6.5361% -6.1020% -5.6854%]
Linux
bgra/i444 352x188 /18 time: [86.322 µs 86.409 µs 86.534 µs]
thrpt: [764.74 Mpx/s 765.84 Mpx/s 766.62 Mpx/s ]
time: [-0.0076% +0.1640% +0.3515%] (p = 0.08 > 0.05)
thrpt: [-0.3503% -0.1637% +0.0076%]
bgra/i444 512x256 /19 time: [172.02 µs 172.14 µs 172.28 µs]
thrpt: [760.81 Mpx/s 761.42 Mpx/s 761.94 Mpx/s ]
time: [+0.4655% +0.6177% +0.7668%] (p = 0.00 < 0.05)
thrpt: [-0.7610% -0.6139% -0.4633%]
bgra/i444 704x374 /20 time: [345.63 µs 345.93 µs 346.25 µs]
thrpt: [760.42 Mpx/s 761.11 Mpx/s 761.78 Mpx/s ]
time: [+0.0779% +0.2617% +0.4370%] (p = 0.00 < 0.05)
thrpt: [-0.4351% -0.2610% -0.0779%]
bgra/i444 1376x764 /22 time: [1.4079 ms 1.4150 ms 1.4234 ms]
thrpt: [738.56 Mpx/s 742.94 Mpx/s 746.69 Mpx/s ]
time: [-0.3074% +0.2414% +0.8748%] (p = 0.43 > 0.05)
thrpt: [-0.8672% -0.2409% +0.3083%]
bgra/i444 5472x3068/26 time: [22.388 ms 22.414 ms 22.440 ms]
thrpt: [748.13 Mpx/s 749.00 Mpx/s 749.86 Mpx/s ]
time: [-0.2295% -0.0904% +0.0696%] (p = 0.23 > 0.05)
thrpt: [-0.0695% +0.0904% +0.2300%]
bgr /i444 416x212 /18 time: [145.32 µs 145.41 µs 145.51 µs]
thrpt: [606.09 Mpx/s 606.50 Mpx/s 606.90 Mpx/s ]
time: [+0.8975% +1.0383% +1.1769%] (p = 0.00 < 0.05)
thrpt: [-1.1632% -1.0276% -0.8895%]
bgr /i444 576x304 /19 time: [288.08 µs 288.28 µs 288.51 µs]
thrpt: [606.93 Mpx/s 607.40 Mpx/s 607.84 Mpx/s ]
time: [+0.0069% +0.1166% +0.2297%] (p = 0.05 < 0.05)
thrpt: [-0.2292% -0.1165% -0.0069%]
bgr /i444 800x438 /20 time: [575.17 µs 575.51 µs 575.86 µs]
thrpt: [608.48 Mpx/s 608.85 Mpx/s 609.21 Mpx/s ]
time: [+0.5930% +0.7199% +0.8441%] (p = 0.00 < 0.05)
thrpt: [-0.8370% -0.7148% -0.5895%]
bgr /i444 1120x626 /21 time: [1.1561 ms 1.1570 ms 1.1580 ms]
thrpt: [605.47 Mpx/s 605.97 Mpx/s 606.45 Mpx/s ]
time: [+0.7744% +0.9169% +1.0498%] (p = 0.00 < 0.05)
thrpt: [-1.0389% -0.9086% -0.7684%]
bgr /i444 1600x874 /22 time: [2.3153 ms 2.3173 ms 2.3196 ms]
thrpt: [602.87 Mpx/s 603.46 Mpx/s 603.99 Mpx/s ]
time: [-0.0072% +0.1236% +0.2548%] (p = 0.07 > 0.05)
thrpt: [-0.2542% -0.1234% +0.0072%]
bgr /i444 2240x1250/23 time: [4.7033 ms 4.7063 ms 4.7093 ms]
thrpt: [594.57 Mpx/s 594.95 Mpx/s 595.32 Mpx/s ]
time: [+1.9482% +2.0462% +2.1409%] (p = 0.00 < 0.05)
thrpt: [-2.0960% -2.0051% -1.9110%]
bgr /i444 3168x1766/24 time: [9.3864 ms 9.3942 ms 9.4019 ms]
thrpt: [595.06 Mpx/s 595.54 Mpx/s 596.04 Mpx/s ]
time: [+2.1028% +2.2331% +2.3620%] (p = 0.00 < 0.05)
thrpt: [-2.3075% -2.1843% -2.0595%]
bgr /i444 4480x2498/25 time: [18.634 ms 18.649 ms 18.664 ms]
thrpt: [599.60 Mpx/s 600.09 Mpx/s 600.58 Mpx/s ]
time: [+0.9236% +1.2752% +1.5192%] (p = 0.00 < 0.05)
thrpt: [-1.4965% -1.2591% -0.9152%]
bgr /i444 6336x3532/26 time: [37.246 ms 37.281 ms 37.316 ms]
thrpt: [599.71 Mpx/s 600.28 Mpx/s 600.84 Mpx/s ]
time: [+1.2940% +1.4522% +1.6077%] (p = 0.00 < 0.05)
thrpt: [-1.5822% -1.4314% -1.2775%]
nv12/bgra 352x188 /18 time: [36.754 µs 36.784 µs 36.816 µs]
thrpt: [1.7975 Gpx/s 1.7991 Gpx/s 1.8005 Gpx/s ]
time: [+35.099% +35.230% +35.373%] (p = 0.00 < 0.05)
thrpt: [-26.130% -26.052% -25.980%]
nv12/bgra 512x256 /19 time: [73.447 µs 73.550 µs 73.658 µs]
thrpt: [1.7795 Gpx/s 1.7821 Gpx/s 1.7846 Gpx/s ]
time: [+34.791% +35.029% +35.252%] (p = 0.00 < 0.05)
thrpt: [-26.064% -25.942% -25.811%]
nv12/bgra 704x374 /20 time: [148.28 µs 148.47 µs 148.67 µs]
thrpt: [1.7710 Gpx/s 1.7734 Gpx/s 1.7757 Gpx/s ]
time: [+34.585% +34.782% +34.993%] (p = 0.00 < 0.05)
thrpt: [-25.922% -25.806% -25.697%]
nv12/bgra 992x530 /21 time: [299.42 µs 299.79 µs 300.18 µs]
thrpt: [1.7515 Gpx/s 1.7537 Gpx/s 1.7560 Gpx/s ]
time: [+35.693% +35.932% +36.151%] (p = 0.00 < 0.05)
thrpt: [-26.552% -26.434% -26.304%]
nv12/bgra 1376x764 /22 time: [627.67 µs 632.36 µs 637.38 µs]
thrpt: [1.6494 Gpx/s 1.6624 Gpx/s 1.6749 Gpx/s ]
time: [+29.852% +32.055% +34.027%] (p = 0.00 < 0.05)
thrpt: [-25.388% -24.274% -22.990%]
nv12/bgra 1952x1076/23 time: [1.4352 ms 1.4396 ms 1.4441 ms]
thrpt: [1.4544 Gpx/s 1.4590 Gpx/s 1.4634 Gpx/s ]
time: [+10.833% +11.472% +12.115%] (p = 0.00 < 0.05)
thrpt: [-10.806% -10.291% -9.7739%]
nv12/bgra 2752x1526/24 time: [2.7450 ms 2.7503 ms 2.7555 ms]
thrpt: [1.5240 Gpx/s 1.5270 Gpx/s 1.5299 Gpx/s ]
time: [+15.944% +16.599% +17.266%] (p = 0.00 < 0.05)
thrpt: [-14.724% -14.236% -13.751%]
nv12/bgra 3872x2168/25 time: [5.4398 ms 5.4537 ms 5.4678 ms]
thrpt: [1.5353 Gpx/s 1.5392 Gpx/s 1.5432 Gpx/s ]
time: [+22.379% +23.042% +23.707%] (p = 0.00 < 0.05)
thrpt: [-19.164% -18.727% -18.287%]
nv12/bgra 5472x3068/26 time: [10.346 ms 10.381 ms 10.418 ms]
thrpt: [1.6115 Gpx/s 1.6171 Gpx/s 1.6226 Gpx/s ]
time: [+20.529% +21.084% +21.622%] (p = 0.00 < 0.05)
thrpt: [-17.778% -17.413% -17.032%]
i420/bgra 352x188 /18 time: [41.028 µs 41.059 µs 41.094 µs]
thrpt: [1.6103 Gpx/s 1.6117 Gpx/s 1.6130 Gpx/s ]
time: [+48.898% +49.187% +49.401%] (p = 0.00 < 0.05)
thrpt: [-33.066% -32.970% -32.840%]
i420/bgra 512x256 /19 time: [81.559 µs 81.642 µs 81.727 µs]
thrpt: [1.6038 Gpx/s 1.6055 Gpx/s 1.6071 Gpx/s ]
time: [+47.322% +47.542% +47.746%] (p = 0.00 < 0.05)
thrpt: [-32.316% -32.223% -32.122%]
i420/bgra 704x374 /20 time: [164.85 µs 165.10 µs 165.36 µs]
thrpt: [1.5923 Gpx/s 1.5948 Gpx/s 1.5972 Gpx/s ]
time: [+48.143% +48.482% +48.808%] (p = 0.00 < 0.05)
thrpt: [-32.799% -32.652% -32.498%]
i420/bgra 992x530 /21 time: [329.59 µs 330.17 µs 330.78 µs]
thrpt: [1.5895 Gpx/s 1.5924 Gpx/s 1.5952 Gpx/s ]
time: [+47.295% +47.672% +48.058%] (p = 0.00 < 0.05)
thrpt: [-32.459% -32.282% -32.109%]
i420/bgra 1376x764 /22 time: [684.11 µs 687.04 µs 690.16 µs]
thrpt: [1.5232 Gpx/s 1.5301 Gpx/s 1.5367 Gpx/s ]
time: [+42.203% +43.986% +45.671%] (p = 0.00 < 0.05)
thrpt: [-31.352% -30.549% -29.678%]
i420/bgra 1952x1076/23 time: [1.5377 ms 1.5439 ms 1.5503 ms]
thrpt: [1.3548 Gpx/s 1.3604 Gpx/s 1.3659 Gpx/s ]
time: [+20.871% +21.703% +22.596%] (p = 0.00 < 0.05)
thrpt: [-18.431% -17.833% -17.268%]
i420/bgra 2752x1526/24 time: [2.9313 ms 2.9384 ms 2.9457 ms]
thrpt: [1.4257 Gpx/s 1.4292 Gpx/s 1.4326 Gpx/s ]
time: [+26.497% +27.056% +27.612%] (p = 0.00 < 0.05)
thrpt: [-21.638% -21.294% -20.946%]
i420/bgra 3872x2168/25 time: [5.7126 ms 5.7216 ms 5.7306 ms]
thrpt: [1.4649 Gpx/s 1.4672 Gpx/s 1.4695 Gpx/s ]
time: [+29.500% +30.134% +30.715%] (p = 0.00 < 0.05)
thrpt: [-23.498% -23.156% -22.780%]
i420/bgra 5472x3068/26 time: [11.269 ms 11.285 ms 11.301 ms]
thrpt: [1.4855 Gpx/s 1.4877 Gpx/s 1.4898 Gpx/s ]
time: [+29.734% +30.706% +31.488%] (p = 0.00 < 0.05)
thrpt: [-23.947% -23.492% -22.919%]
i444/bgra 352x188 /18 time: [47.148 µs 47.217 µs 47.292 µs]
thrpt: [1.3993 Gpx/s 1.4015 Gpx/s 1.4036 Gpx/s ]
time: [+41.541% +41.858% +42.156%] (p = 0.00 < 0.05)
thrpt: [-29.655% -29.507% -29.349%]
i444/bgra 512x256 /19 time: [94.290 µs 96.602 µs 99.574 µs]
thrpt: [1.3163 Gpx/s 1.3568 Gpx/s 1.3901 Gpx/s ]
time: [+44.463% +47.971% +52.528%] (p = 0.00 < 0.05)
thrpt: [-34.438% -32.419% -30.778%]
i444/bgra 704x374 /20 time: [188.11 µs 189.54 µs 191.70 µs]
thrpt: [1.3735 Gpx/s 1.3891 Gpx/s 1.3997 Gpx/s ]
time: [+44.267% +45.378% +46.876%] (p = 0.00 < 0.05)
thrpt: [-31.915% -31.214% -30.684%]
i444/bgra 992x530 /21 time: [379.07 µs 381.87 µs 386.67 µs]
thrpt: [1.3597 Gpx/s 1.3768 Gpx/s 1.3870 Gpx/s ]
time: [+45.467% +46.615% +48.562%] (p = 0.00 < 0.05)
thrpt: [-32.688% -31.794% -31.256%]
i444/bgra 1376x764 /22 time: [770.66 µs 772.35 µs 774.09 µs]
thrpt: [1.3581 Gpx/s 1.3611 Gpx/s 1.3641 Gpx/s ]
time: [+40.162% +41.050% +41.881%] (p = 0.00 < 0.05)
thrpt: [-29.518% -29.103% -28.654%]
i444/bgra 1952x1076/23 time: [1.6235 ms 1.6785 ms 1.7435 ms]
thrpt: [1.2047 Gpx/s 1.2513 Gpx/s 1.2937 Gpx/s ]
time: [+26.218% +30.531% +35.251%] (p = 0.00 < 0.05)
thrpt: [-26.064% -23.390% -20.772%]
i444/bgra 2752x1526/24 time: [3.1561 ms 3.2306 ms 3.3220 ms]
thrpt: [1.2642 Gpx/s 1.2999 Gpx/s 1.3306 Gpx/s ]
time: [+27.093% +29.957% +33.693%] (p = 0.00 < 0.05)
thrpt: [-25.202% -23.051% -21.318%]
i444/bgra 3872x2168/25 time: [6.2715 ms 6.2811 ms 6.2907 ms]
thrpt: [1.3344 Gpx/s 1.3365 Gpx/s 1.3385 Gpx/s ]
time: [+27.733% +28.685% +29.566%] (p = 0.00 < 0.05)
thrpt: [-22.820% -22.291% -21.711%]
i444/bgra 5472x3068/26 time: [12.635 ms 12.700 ms 12.813 ms]
thrpt: [1.3102 Gpx/s 1.3219 Gpx/s 1.3287 Gpx/s ]
time: [+28.388% +29.304% +30.591%] (p = 0.00 < 0.05)
thrpt: [-23.425% -22.663% -22.111%]
I see. Thanks for re-checking the performance @fabiosky. I'll close this issue now since it doesn't seem like there's anything to change.
It would be great if the crate detected AVX512 support and used it if the CPU supports it.
Do you know if the AVX2 slow down for AMD platforms is still an issue on modern CPUs?
Thanks