Closed smu160 closed 1 week ago
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.16%. Comparing base (
20fafba
) to head (713b3a6
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @calebzulawski,
We now have the basics of what we'd need to support signals that are stored as an array of complex nums. Looking forward to hearing your thoughts!
Best, Saveliy
Something like this looks good to me! Are you open to possibly using a more trait-oriented design in the future? I wouldn't want to hold up this functionality, but I think various input and output types could be combined into one generic function.
Hi @calebzulawski,
Thank you for the review! I think your trait-oriented solution would be great. Just to clarify, do we want to wait on merging this PR? Or do we want this PR as an interim solution?
Best, Saveliy
I would probably merge it to have something to start from--just something to consider before a 1.0 release
I've taken the liberty of rewriting separate_re_im
and pushing to your branch.
I couldn't get it to vectorize on godbolt, but at least it should have the right public API, and we can always optimize it later.
I tried a bunch more things and couldn't get it to autovectorize: https://godbolt.org/z/7Kd8v4vPG
We might need something like bytemuck here to view a Complex
as a &[T; 2]
or even better &[Complex<T>]
as &[T]
with double the length
I think the approach I would take is iterate by chunks, bytemuck to [T; N * 2]
, Simd::deinterleave, wrapped in #[multiversion]
Yeah that sounds good to me
I am working on a simd-complex crate--this seems like an important enough algorithm to include there as an AoSoA <-> AoS conversion
@Shnatsel
I have a rough version of bytemuck
+ Simd deinterleave, on godbolt.
Let me know your thoughts. Thank you!
I've taken a look at the Godbolt version. Wow, that is some aggressive loop unrolling! It looks good to me though.
One more thing to try: we need to benchmark if iterating over vec![0; n]
is faster or slower than Vec::with_capacity(n)
plus Vec::extend_from_slice(deinterleaved.to_array())
in the main loop. The latter incurs slightly more instructions but avoids a memset to zero out both vectors.
@Shnatsel I added the implementation using bytemuck
+ portable SIMD deinterleave
. The solution in godbolt had a bounds check, so that was removed as well. You can see the difference in godbolt.
@Shnatsel
One more thing to try: we need to benchmark if iterating over vec![0; n] is faster or slower than Vec::with_capacity(n) plus Vec::extend_from_slice(deinterleaved.to_array()) in the main loop. The latter incurs slightly more instructions but avoids a memset to zero out both vectors.
I added a benchmark, using criterion, to test this idea. The benchmark is probably not leveraging criterion to its full potential, so let me know.
Running benches/bench.rs (target/release/deps/bench-3ef076ccef6e83be)
Gnuplot not found, using plotters backend
separate_re_im/with_zeroed_vecs/1024
time: [405.64 ns 408.02 ns 411.05 ns]
thrpt: [2.4912 Gelem/s 2.5097 Gelem/s 2.5244 Gelem/s]
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
separate_re_im/with_capacity/1024
time: [422.60 ns 425.43 ns 428.32 ns]
thrpt: [2.3907 Gelem/s 2.4070 Gelem/s 2.4231 Gelem/s]
Found 18 outliers among 100 measurements (18.00%)
1 (1.00%) low mild
2 (2.00%) high mild
15 (15.00%) high severe
separate_re_im/with_zeroed_vecs/4096
time: [1.9314 µs 1.9407 µs 1.9508 µs]
thrpt: [2.0996 Gelem/s 2.1106 Gelem/s 2.1207 Gelem/s]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
separate_re_im/with_capacity/4096
time: [1.7073 µs 1.7213 µs 1.7380 µs]
thrpt: [2.3567 Gelem/s 2.3795 Gelem/s 2.3991 Gelem/s]
Found 19 outliers among 100 measurements (19.00%)
17 (17.00%) low mild
2 (2.00%) high severe
separate_re_im/with_zeroed_vecs/16384
time: [8.1350 µs 8.4114 µs 8.7453 µs]
thrpt: [1.8735 Gelem/s 1.9478 Gelem/s 2.0140 Gelem/s]
separate_re_im/with_capacity/16384
time: [8.5694 µs 8.6258 µs 8.7400 µs]
thrpt: [1.8746 Gelem/s 1.8994 Gelem/s 1.9119 Gelem/s]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
separate_re_im/with_zeroed_vecs/65536
time: [32.902 µs 33.322 µs 33.821 µs]
thrpt: [1.9377 Gelem/s 1.9668 Gelem/s 1.9918 Gelem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
separate_re_im/with_capacity/65536
time: [39.719 µs 39.987 µs 40.303 µs]
thrpt: [1.6261 Gelem/s 1.6389 Gelem/s 1.6500 Gelem/s]
Found 27 outliers among 100 measurements (27.00%)
3 (3.00%) low severe
11 (11.00%) low mild
3 (3.00%) high mild
10 (10.00%) high severe
separate_re_im/with_zeroed_vecs/262144
time: [243.93 µs 247.17 µs 251.19 µs]
thrpt: [1.0436 Gelem/s 1.0606 Gelem/s 1.0747 Gelem/s]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
separate_re_im/with_capacity/262144
time: [276.91 µs 279.97 µs 283.36 µs]
thrpt: [925.12 Melem/s 936.31 Melem/s 946.67 Melem/s]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking separate_re_im/with_zeroed_vecs/1048576: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
separate_re_im/with_zeroed_vecs/1048576
time: [1.2126 ms 1.2465 ms 1.2828 ms]
thrpt: [817.42 Melem/s 841.24 Melem/s 864.72 Melem/s]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low mild
1 (1.00%) high mild
13 (13.00%) high severe
Benchmarking separate_re_im/with_capacity/1048576: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.4s, enable flat sampling, or reduce sample count to 50.
separate_re_im/with_capacity/1048576
time: [1.4784 ms 1.4848 ms 1.4942 ms]
thrpt: [701.76 Melem/s 706.22 Melem/s 709.26 Melem/s]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
Edit: Added in results
@calebzulawski @Shnatsel
A draft PR to address issue #22.