dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

Move DataFrame into separate repo without any dependency on ML .Net libraries #6887

Open asmirnov82 opened 7 months ago

asmirnov82 commented 7 months ago

Is your feature request related to a problem? Please describe.

Currently DataFrame release cycle depends on ML .NET (major version of the DataFrame is released with ML .NET once a year) DataFrame nuget package also has depency on Microsoft.ML.DataView (which is not required for some of the users, that use DataFrame for data analysis without using ML .Net features).

Having separate DataFrame repo without dependency on ML. NET will allow to increase the speed of development new DataFrame features and decrease time to market. Having ML .NET dependency on particular version of DataFrame nuget package instead of the latest on the other hand increases stability of ML .NET. It aslo allows ML .NET to support wider range of .Net framework, while the newest version of the DataFrame may drop support of legacy .Net standard and go on with the latest LTS only.

Describe the solution you'd like ML .Net should provide extension methods for converting the DataFrame into IDataView and backward, also it should provide ML specific columns (currently VBufferDataFrameColumn) .

The DataFrame shouldn't have any specific ML .NET columns and functionality (only Apache Arrow compatible types) and no dependencies on ML .NET packages.

The DataFrame should be moved to a separate repo.

luisquintanilla commented 7 months ago

Thanks for posting this proposal @asmirnov82. I like the proposal.

Thoughts @michaelgsharp @JakeRadMSFT @ericstj @stephentoub @tannergooding

JakeRadMSFT commented 7 months ago

I think we can do the following to resolve some of the issues without bringing in all the overhead of a different repo.

@asmirnov82 would this mostly solve existing issues?

asmirnov82 commented 7 months ago

@JakeRadMSFT, I agree. This should solve all issues