datafusion-contrib / datafusion-python

Python binding for DataFusion
https://arrow.apache.org/datafusion/python/index.html
Apache License 2.0
59 stars 12 forks source link

Use custom allocator in Python build #27

Closed Dandandan closed 2 years ago

Dandandan commented 2 years ago

When compiling in native Rust, we have the option to include mimalloc or smalloc for improved performance. In my experience mimalloc and smalloc behave quite similar.

Polars uses mimalloc in the Python binary: https://github.com/pola-rs/polars/blob/master/py-polars/Cargo.toml#L22

matthewmturner commented 2 years ago

Do you think this is the only optimization we'll be able to add? If so, ill close my issue.

Dandandan commented 2 years ago

Do you think this is the only optimization we'll be able to add? If so, ill close my issue.

I think this + upgrading the CPU feature list will be the main two. LTO is already enabled.

matthewmturner commented 2 years ago

Cross post from slack.

Wow. Updating allocator to sn-malloc made big difference. This run also included full target feature list. However, when I tried just target features there was minimal impact (maybe i did it wrong?). Im not sure if some combinatation of feature plus allocator also helped. Theres still some juice to squeeze but the result compared to others are quite good across the board. Exciting!

q1: 0.05378841600000017
q2: 0.3460620409999997
q3: 1.1944592499999995
q4: 0.051161000000000456
q5: 1.214791333
q6: 1.3261748329999996
q7: 1.3834479999999996
q8: 3.088559250000001
q9: 0.6630860410000015
q10: 17.969405375

this is how i built the python wheels / the feature list in case anyone is interested: export RUSTFLAGS='-C target-feature=+fxsr,+sse,+sse2,+sse3,+ssse3,+sse4.1+sse4.2,+popcnt,+aes,+avx,+avx2' && maturin build --release i will probably start working on the automation script next so others can test and we can sure what ive done is correct / accurate.