dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

FastForestOva has its prediction value in Score[1] instead of Score[0] #6037

Open andrasfuchs opened 2 years ago

andrasfuchs commented 2 years ago

System Information (please complete the following information):

Describe the bug Usually the probabilities of the trained models are stored in the Score values of the ModelOutput class (generated by ML.NET Model Builder). The first item in that array of floats is the prediction value when we use the FastTreeOva or LightGbmMulti algorithms, but interestingly it is the second item in that array if we use FastForestOva. I'm not 100% sure that it is a bug, but I would expect consistency in this regard among the different algorithms.

To Reproduce Steps to reproduce the behavior:

  1. Go to Solution Explorer
  2. Right click on the project, add Machine Learning Model
  3. Start training your model, and stop at the different times to have different algorithms as best
  4. Test the generated classes by running them with your input data and show the ModelOutput.Score array
  5. Observe the different values and their meaning, especially for the FastForestOva algorithm

Expected behavior I would expect all the algorithms to return their prediction values in the same order.

Screenshots, Code, Sample Projects Accuracy is calculated based on OutputModel.Score[0] values and IsAttached uses LightGbmMulti, IsNotAttached uses FastForestOva. They both had 99+% accuracy during training. image

Additional context I could test FastTreeOva, LightGbmMulti and FastForestOva: the first two had their prediction value in Score[0], and FastForestOva had its in Score[1].

michaelgsharp commented 2 years ago

I would agree that consistency is important... we will have to see what is causing this.

@JakeRadMSFT is this possibly due to how model builder is generating the Model Output file? Or do we need to go dig through the ML.NET code and see whats going on?

michaelgsharp commented 2 years ago

@andrasfuchs would you be able to share your training pipeline? Both with this one and #6035?

andrasfuchs commented 2 years ago

Sure, I use Visual Studio 2022 v17.0.4 with the ML.NET Model Builder 2022 v16.9.2.2205603 extension. I have a C# console application that includes these mbconfigs to generate the ML models: https://drive.google.com/file/d/1CZJitEjEd3GEhBbHZUMOgEbRg3GaPsMD/view?usp=sharing

Here you can find the data that I used for training and for the model accuracy measurements: https://drive.google.com/file/d/1C2Of2UIHN2y7l5J-SvEvacFhTAHMCCcm/view?usp=sharing

My source code is work in progress, but you can check it out here: https://github.com/andrasfuchs/BioBalanceDetector/tree/master/Software/Source/BBDProto08/BBD.SleepLogger

Let me know if you need anything more!

andrasfuchs commented 2 years ago

Hi guys, did you have the chance to look into this? Do you need any more data from me to reproduce the problem?