datafusion-contrib / datafusion-python

Python binding for DataFusion
https://arrow.apache.org/datafusion/python/index.html
Apache License 2.0
59 stars 12 forks source link

Add custom global allocator #30

Closed matthewmturner closed 2 years ago

matthewmturner commented 2 years ago

Closes #27

matthewmturner commented 2 years ago

@Dandandan FYI

Dandandan commented 2 years ago

Did you also try mimalloc (see https://github.com/pola-rs/polars/blob/master/py-polars/Cargo.toml#L22). Should give similar results, might be faster or slower on some queries. Still interested which one is the best to pick.

matthewmturner commented 2 years ago

mimalloc results below - faster on almost all queries except Q8 and Q10. This may be the better option.

q1: 0.0366870829999999
q2: 0.32823987499999996
q3: 1.1415230410000001
q4: 0.028246750000000098
q5: 1.1575570830000004
q6: 1.2152348750000006
q7: 1.1023409590000002
q8: 2.9869192910000004
q9: 0.5675542499999988
q10: 25.437209041
matthewmturner commented 2 years ago

@Dandandan to confirm - I haven't updated PR with mimalloc yet. Did you approve for snmalloc or mimalloc?

Dandandan commented 2 years ago

Ah I didn't catch that 😅. I think mimalloc would be a good choice to go forward with. I think it will be interesting in the future to keep experimenting with this (e.g. there is an upcoming smalloc 2 version).

matthewmturner commented 2 years ago

Ok sounds good! Will update.