dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.94k stars 1.86k forks source link

Add support for Apache.Arrow.Types.TimestampType #6809

Closed Chiragjasuja closed 7 months ago

Chiragjasuja commented 10 months ago

System Information (please complete the following information):

Describe the bug I am getting below exception

Unhandled exception. System.NotImplementedException: timestamp at Microsoft.Data.Analysis.DataFrame.AppendDataFrameColumnFromArrowArray(Field field, IArrowArray arrowArray, DataFrame ret, String fieldNamePrefix) at Microsoft.Data.Analysis.DataFrame.FromArrowRecordBatch(RecordBatch recordBatch) at Program.

$(String[] args) in C:\Users\mrchi\source\repos\ApacheArrowExample\ApacheArrowExample\Program.cs:line 17 at Program.
(String[] args)

To Reproduce Steps to reproduce the behavior:

  1. Take a .arrow file with one column of type Apache.Arrow.Types.TimestampType
  2. var dataframe = DataFrame.FromArrowRecordBatch(recordBatch);
  3. It will throw above exception

Expected behavior The record batch should transform to Dataframe with appropriate tye to handle timestamp (Datetime maybe)

totalgit74 commented 10 months ago

I'd like to see this implemented. Time-series require it and using the Arrow IPC/feather format to move data between .Net, R data.table and Python DataFrame as required is very handy in order to make language agnostic data stores/formats.

bhuesemann commented 9 months ago

Same issue here while processing arrow data from BigQuery IPC. The timestamp datatyp is pretty much basic to many analysis use-cases and thus needs to be supported.