qa_volk_16ic_x2_dot_prod_16ic is flaky

Example of a failed CI run: https://github.com/gnuradio/volk/actions/runs/6606089847/job/17941931107

Failures can be reproduced locally with:

ctest -R qa_volk_16ic_x2_dot_prod_16ic --output-on-failure --repeat until-fail:100000

I would guess that the failure is due to different arithmetic (modular vs. saturating) across the kernels.

The test logic intentionally uses very small integers to avoid overflow/saturation conditions, but this is not foolproof when the input vectors are long:

https://github.com/gnuradio/volk/blob/42f57cd67506e7fb6a7795af3948b803de0085f4/lib/qa_utils.cc#L75-L77

I think the kernels should be adjusted so they perform the same type of arithmetic. (Saturating arithmetic everywhere?)

Log:

18: RUN_VOLK_TESTS: volk_16ic_x2_dot_prod_16ic(131071,1)
18: generic completed in 0.304401 ms
18: a_sse2 completed in 0.0966 ms
18: u_sse2 completed in 0.0554 ms
18: u_avx2 completed in 0.0587 ms
18: a_avx2 completed in 0.0393 ms
18: offset 0 in1: -32533 in2: -32768 tolerance was: 0
18: volk_16ic_x2_dot_prod_16ic: fail on arch a_sse2
18: offset 0 in1: -32533 in2: -32768 tolerance was: 0
18: volk_16ic_x2_dot_prod_16ic: fail on arch u_sse2
18: offset 0 in1: -32533 in2: -32768 tolerance was: 0
18: volk_16ic_x2_dot_prod_16ic: fail on arch u_avx2
18: offset 0 in1: -32533 in2: -32768 tolerance was: 0
18: volk_16ic_x2_dot_prod_16ic: fail on arch a_avx2

gnuradio / volk

qa_volk_16ic_x2_dot_prod_16ic is flaky #676