dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.98k stars 1.88k forks source link

Support for "Multi target regression models" (MTR) #2134

Open CESARDELATORRE opened 5 years ago

CESARDELATORRE commented 5 years ago

This feature request started from a particular customer feedback (see feedback at the end of the comment).

Context/explanation:

Most Machine Learning models targeting a regression problem usually support a single target variable to predict, which for the case of a regression is a numeric value.

However, other machine learning frameworks also provide "Multi target regression models" (MTR) like explained in the post link below:

https://towardsdatascience.com/regression-models-with-multiple-target-variables-8baa75aacd

However, ML.NET currently doesn't have a built-in multi-output regression learner/trainer.

Currently, by just using ML.NET, you need to use a different trained model per each target variable/prediction. If you want to predict 5 different target or dependent variables , you'd need to create 5 different models for that, instead of a single model predicting 5 target variables.

FEATURE:

The implementation of this feature would allow ML.NET to support "Multi target regression models" (MTR), built-in in ML.NET without needing external frameworks like TensorFlow.


CUSTOMER FEEDBACK:

ta.speot.is Hi, thanks for improving ML.NET. I’ve spent a little bit of time with it and it’s nice to have a first-class .NET API for Machine Learning.

Right now I’m using ML.NET very much like described in “Tutorial: Predict New York taxi fares using a regression learner with ML.NET” but I’m wondering how to build on it. Presently I’m predicting one attribute (in the tutorial’s case: the taxi fare) but I have more complicated scenarios I want to predict that involve multiple attributes (using the tutorial’s domain it would be predicting, say, taxi fare AND a surge charging multiple e.g. 1.0x, 1.5x).

Trying to make “Score” an array of floats didn’t work (the glossary on MSDN says regression is “the output is a real value, for example, double” i.e. one value so that it didn’t work was to be expected).

Obviously I could train 10 models for the 10 attributes I want to predict but I feel like there’s a better way.

If anybody has any thoughts I’d appreciate any suggestions!


zeahmed commented 5 years ago

Currently, ML.Net's TensorFlow training and scoring component can be used for achieving this task. Please look at the following example. https://github.com/zeahmed/DeepLearningWithMLdotNet/tree/master/NYCTaxiMultiOutputRegression

baruchiro commented 5 years ago

Any update in this issue?

maamardli commented 4 years ago

Any update on this issue?

sheganinans commented 4 years ago

Just want to voice my very strong support for this feature.

Otherwise ML.Net is an excellent framework, thanks!

KeithT commented 3 years ago

Adding a supporting voice for this. Currently having to train many models...........:o(

lukaszadamus commented 3 years ago

any updates?

mruch2 commented 3 years ago

@CESARDELATORRE any updates on this?

briacht commented 3 years ago

Thanks for bringing this back up!

Could you tell me a little bit more about the types of business problems you'd like to solve with multi target regression models and why training multiple regression models does not fulfill your requirements?

mruch2 commented 3 years ago

@briacht my life has been spent in the manufacturing, CAD Design and 3D rendering world. The issue with evaluating multiple models is the idea of time to evaluate each model. When you are using a product configurator or interpreting dimensional value changes in real time then taking those inputs and running them against multiple models to help a user understand that what they are trying to make is a good / bad design or even its feasibility all together makes for a bad experience. These are described in certain cases with multiple outputs to a user. There are scenarios as well where things need a great deal of calculation to get the mathematical value for but we cut all that time out by using machine learning to determine things like volume, area, load.

In the engineering and design field you have control values that are changed rapidly by a few thousandths of an inch or tenths of a millimeter loading screens between each change are extremely un-favorable.

Being the one who trains and creates multiple models with the same exact features but just to define a different output value seems tedious and over complicated.

In my opinion this feature will not only help out my industry but others as well. I also think that extending this library to meet or surpass things that other languages do just helps drive more users to use .Net. Especially with the combination of the Blazor product and using client side processing against ML.Net models.

briacht commented 3 years ago

@mruch2 Thanks for the feedback.

A few follow up questions:

  1. What are your requirements in terms of model inference and evaluating time?
  2. Is multi-label classification also something you're interested in, or only the regression case?
  3. What would be your ideal experience for multi target regression models in ML.NET?
lukaszadamus commented 3 years ago

In my case, it about using AI to predict hearing loss. We have a huge number of real audiograms, for each age group, it is possible to ask the patient a question about age, then measure 4 different frequencies (two for left and two for right ear) to predict a full audiogram. Audiogram has multiple values, for example, 5 frequencies per ear, that is why we were looking for MTR. In the end, we ended up with Bayesian statistics which works fine for us.

mruch2 commented 3 years ago

@briacht I dont think users would complain about a screen flicker here and there and to be honest if it were just a single model that had to be evaluated based on current timing the user would never notice this going on based on our use case. I realize that it is an assumption but based on how it currently works I am pretty confident additional predictive values would not add a considerable amount of time as the model does not change.

Extending the ML.Net library to do more machine learning functions only further pushes people to use c# and drop other languages. With Blazor and .Net 6 coming out MS has basically provided its developers with a one stop shop for the "what's hot" in developer and business networks right now. A sluggish trailing to the game has been ML.Net though it does do some amazing things it still pushes people to learn and use other languages because of how fast other machine learning libraries allow users to work. Time is money, and in this case couple major function limitations can equate to a great amount of time creating multiple models to achieve the task when one model in other libraries will do.

In addition if you take the reality that more and more companies, more and more developers have requirements to do model based predictions in real time then there are probably several things that need to be considered for enhancing the ML.Net library. In our use case it is quite a pain to have to carry multiple models around in memory for quick access just so we can keep our code base in c# and also be able to easily get things working in WASM.

I think the initial goal just to stay competitive is have the same functionality as TensorFlow and keeping the same one liner thought process that was clearly used to make the current ML.Net.