dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.97k stars 1.87k forks source link

[DataFrame][Request] Multi-dimensional Index #5658

Open DanielTrommel opened 4 years ago

DanielTrommel commented 4 years ago

Thanks for adding the DataFrame type! It is briljant, and I have already tried it out, and the current functionality works great. I have done a small performance comparison, porting some code from a project from Python to C#: having code with row specific operations (the apply() in pandas) I have experienced that in .NET I get 50-100 times faster execution times than with Python+pandas.

To be able to really use DataFrame in our projects, we would need to also have Multi-dimensional Indexing capabilities.

It this something on the roadmap, in the nearby future?

pgovind commented 4 years ago

That's great to hear. If you can, would you mind sharing your benchmark code/scenario here so we can consider adding it to our benchmarks?

Can you clarify what you mean by multi-dimensional indexing capabilities? Do you mean a) the concept of an index(similar to Pandas)? Or b)do you mean syntax such as df[rowIndex][columnIndex].

Option b already exists today. If you mean a, that is in the roadmap, but it is not being worked on currently. It is a substantial piece of work that would likely impact many of the APIs we currently support. If you/someone wants to get started on it, that'd be welcome too!

DanielTrommel commented 4 years ago

Thanks for your answer; the comparison is with (sensitive) customer data, and a industry specific set of calculation methods (executed per row); it doesn't feel generic enough to provide as benchmark (besides the requirement to anonymize it)

Indeed, I meant support the pandas indexing (option a), with support for different levels.

Great to hear that it is on the roadmap, and can imagine that it is a big chunk of labour. I expect it to be a bridge too far for me never having done C# development in a project setting, and his being a major addition.