Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.83k forks source link

Add `PyArrow.table` or `PyArrow.dataset` as options for return objects in `MLTable` #32920

Open ion-elgreco opened 1 year ago

ion-elgreco commented 1 year ago

Is your feature request related to a problem? Please describe. It's currently not possible to return a PyArrowtable or PyArrowdataset with the MLTable library. This is very annoying because I can see in the code there is a Rust reader for deltalake (probably using delta-rs), which returns PyArrowrecordbatches and these are converted to a Pandas dataframe. There are better libraries nowadays that use arrow as memory format such as Polars.

So, it would be way better if we could simply return a PyArrow tableor PyArrow dataset, preferably both because a dataset can be used for lazy execution.

Describe the solution you'd like A clear and concise description of what you want to happen.

Add PyArrow table/dataset as possible return object for the MLTable class. e.g. (to_pyarrow_table, to_pyarrow_dataset 

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. There are no alternatives, you now have to go through Pandas which is as inefficient as you can be with possible data loss due to type coercion.

iscai-msft commented 1 year ago

Thanks for opening this issue @ion-elgreco , tagging @azureml-github to look io this issue.

YalinLi0312 commented 1 year ago

@ion-elgreco which library are you using? Is this one? azure-data-tables

annatisch commented 1 year ago

@YalinLi0312, no I believe this issue is referring to this package, although it's not maintained in this repo: https://pypi.org/project/mltable/ This issue needs to be routed to the ML folks - I've corrected the labeling. :)

Dekermanjian commented 9 months ago

Hello, this is something I am also interested in seeing added. Are there any updates on this issue?

github-actions[bot] commented 6 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.