Open asmirnov82 opened 7 months ago
Thanks for posting this proposal @asmirnov82. I like the proposal.
Thoughts @michaelgsharp @JakeRadMSFT @ericstj @stephentoub @tannergooding
I think we can do the following to resolve some of the issues without bringing in all the overhead of a different repo.
@asmirnov82 would this mostly solve existing issues?
@JakeRadMSFT, I agree. This should solve all issues
Is your feature request related to a problem? Please describe.
Currently DataFrame release cycle depends on ML .NET (major version of the DataFrame is released with ML .NET once a year) DataFrame nuget package also has depency on Microsoft.ML.DataView (which is not required for some of the users, that use DataFrame for data analysis without using ML .Net features).
Having separate DataFrame repo without dependency on ML. NET will allow to increase the speed of development new DataFrame features and decrease time to market. Having ML .NET dependency on particular version of DataFrame nuget package instead of the latest on the other hand increases stability of ML .NET. It aslo allows ML .NET to support wider range of .Net framework, while the newest version of the DataFrame may drop support of legacy .Net standard and go on with the latest LTS only.
Describe the solution you'd like ML .Net should provide extension methods for converting the DataFrame into IDataView and backward, also it should provide ML specific columns (currently VBufferDataFrameColumn) .
The DataFrame shouldn't have any specific ML .NET columns and functionality (only Apache Arrow compatible types) and no dependencies on ML .NET packages.
The DataFrame should be moved to a separate repo.