Open Dandandan opened 2 years ago
ill play with these flags locally and keep you posted on impact
@Dandandan
I've done the following to build the wheel:
export RUSTFLAGS='-C target-feature=+fxsr,+sse,+sse2,+sse3,+ssse3,+sse4.1+sse4.2,+popcnt,+aes,+avx,+avx2' && maturin build --release
Then i just reinstalled the wheel and reran the benchmark which produced the following:
q1: 0.043521209000000116
q2: 0.4907338750000001
q3: 2.0281409170000004
q4: 0.03750329200000024
q5: 2.112818584
q6: 2.1120300420000007
q7: 2.0400456249999994
q8: 3.093032082999999
q9: 2.1041081250000016
q10: 50.334135208999996
These results were basically in line with the unoptimized build so im wondering if ive done something wrong.
any thoughts?
@realno FYI
When I tried target-cpu=skylake
for roapi, i got 10-20% speed improvements. Just as a quick test, do you get any performance gain with target-cpu=native
?
below is with native and sn-malloc
- some faster, some slower. roughly in line.
q1: 0.05099512500000003
q2: 0.3307659999999999
q3: 1.228696541
q4: 0.062102542000000316
q5: 1.2268319589999996
q6: 1.2571589580000002
q7: 1.1611415420000002
q8: 2.9696968339999996
q9: 0.6929859999999994
q10: 20.191931167
Rust by default compiles towards a very old architecture, which limit the performance of the.
We should probably update this with a newer An example of Polars usage:
https://github.com/pola-rs/polars/blob/master/.github/deploy_manylinux.sh#L11
There are a bit of stats over here:
https://store.steampowered.com/hwsurvey
SSE2100.00% SSE3100.00% LAHF / SAHF99.99% CMPXCHG16B99.98% SSSE399.27% SSE4.198.89% SSE4.298.50% FCMOV97.23% NTFS96.06% AES95.50% AVX94.38% AVX286.31%
I think we could maybe enable all features up to
avx2
andAES
. AES is in use byahash
which will improve performance in hash joins and hash aggregates. Other features improve overall performance, e.g. in kernels, parquet reader, and DataFusion code.