Open ion-elgreco opened 1 year ago
Thanks for opening this issue @ion-elgreco , tagging @azureml-github to look io this issue.
@ion-elgreco which library are you using? Is this one? azure-data-tables
@YalinLi0312, no I believe this issue is referring to this package, although it's not maintained in this repo: https://pypi.org/project/mltable/ This issue needs to be routed to the ML folks - I've corrected the labeling. :)
Hello, this is something I am also interested in seeing added. Are there any updates on this issue?
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.
Is your feature request related to a problem? Please describe. It's currently not possible to return a PyArrowtable or PyArrowdataset with the MLTable library. This is very annoying because I can see in the code there is a Rust reader for deltalake (probably using delta-rs), which returns PyArrowrecordbatches and these are converted to a Pandas dataframe. There are better libraries nowadays that use arrow as memory format such as Polars.
So, it would be way better if we could simply return a
PyArrow table
orPyArrow dataset
, preferably both because a dataset can be used for lazy execution.Describe the solution you'd like A clear and concise description of what you want to happen.
Add PyArrow table/dataset as possible return object for the MLTable class. e.g. (
to_pyarrow_table
,to_pyarrow_dataset
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. There are no alternatives, you now have to go through Pandas which is as inefficient as you can be with possible data loss due to type coercion.