Closed NobodyXu closed 3 years ago
I tried this approach before and find that it inhibited auto-vectorization for some reason. But maybe your latest change changes that?
Can you benchmark with and without this change? Something like (on Linux) taskset 0x01 cargo criterion --bench pushgen_dedup_flatten_filter_map
(and pushgen_flatten_dedup_filter_map
)
If not I can do that, hopefully later today. Would be much appreciated!
I am free and on my linux laptop right now, so I will do this right away.
Benchmark result after this commit:
pushgen_dedup_flatten_filter_map
time: [437.56 us 440.42 us 443.86 us]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Before this commit:
pushgen_dedup_flatten_filter_map
time: [256.45 us 258.42 us 260.88 us]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe
Seems that it is true that without assignment, some optimization is hindered.
Yah, seems that way. So I think we put this change on hold for now
Maybe just add a comment in the source code regarding this?
Sure
Updated the PR and forced push the commits
Thanks!!
Signed-off-by: Jiahao XU Jiahao_XU@outlook.com