dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

Feature request: Add support for saving/loading IDataView to/from csv/tsv directly and other pandas-like functionalities #5347

Open LittleLittleCloud opened 4 years ago

LittleLittleCloud commented 4 years ago

This will be a useful feature when using ML.Net to build pipeline in jupyter notebook, especially in data preprocessing steps.

Other useful functions can be

antoniovs1029 commented 4 years ago

Hi, @LittleLittleCloud .

Saving and loading from TSV/CSV is already possible with TextLoader and TextSaver. Look for the TextLoaderSaverCatalog.cs for these APIs. So I'm not sure what's the request on this regard.

As for the other functions, they do sound helpful for Jupyter notebooks, and I think they'd be nice additions to ML.NET. By the way, on the dv[name] request, ML.NET does have an extension method called GetColumn:

https://github.com/dotnet/machinelearning/blob/a76936546f5e269fc1da61f33ab541389e445294/src/Microsoft.ML.Data/Utilities/ColumnCursor.cs#L25-L26

antoniovs1029 commented 4 years ago

For the record, if users are looking for pandas-like features to create or read IDataViews compatible with ML.NET, there's the DataFrame project:

https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/

https://github.com/dotnet/interactive/blob/main/samples/notebooks/csharp/Samples/HousingML.ipynb

Thanks to @eerhardt for the heads-up about it