Azure / azure-kusto-python

Kusto client libraries for Python
MIT License
183 stars 107 forks source link

Support Arrow strings in dataframe_from_result_table #521

Closed cdeil closed 7 months ago

cdeil commented 9 months ago

Pandas now fully supports using Arrow memory and specifically Arrow strings and will make it the default in the next release: https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#dedicated-string-data-type-backed-by-arrow-by-default

As mentioned in #372 we would very much appreciate memory / CPU improvements for our ADX -> pandas workloads.

Could you please add support in azure.kusto.data.helpers.dataframe_from_result_table or a new function to go efficiently ADX -> Arrow memory without needing to go via Python objects for strings?

cc @yihezkel

AsafMah commented 9 months ago

Hey, pandas 2.0 sadly requires python 3.8. Currently, we are still supporting python 3.7, so we can't make this change yet. When we will remove support for 3.7 we can progress with this.

cdeil commented 9 months ago

Do you have an ETA for dropping Python 3.7 support?

AsafMah commented 7 months ago

Hey, we will add support for pandas' arrow strings in the next version, due this or next week. Re-writing the internals to be more effcient, potentially using arrows - is something we want to head towards, but don't have a timeline yet.

AsafMah commented 7 months ago

Version 4.4.0 supports pandas strings - let me know if it worked for your usecase