dotnet / samples

Sample code referenced by the .NET documentation
https://docs.microsoft.com/samples/browse
Creative Commons Attribution 4.0 International
3.4k stars 5.08k forks source link

Bug in the Regression Tutorial #4691

Closed caitaozhan closed 3 years ago

caitaozhan commented 3 years ago

https://github.com/dotnet/samples/blob/b4e9a57ac3f705a9bb09f059149f883a7b400ad4/machine-learning/tutorials/TaxiFarePrediction/Program.cs#L125

Hi, the predicted fare here is 0. There must be a bug somewhere. I am very surprised that a tutorial will have such kind of a bug.

luisquintanilla commented 3 years ago

Hi @caitaozhan

Apologies for the delayed response & thanks for raising this issue. In the future to get a more immediate response, for documentation related issue, please raise them in the dotnet/docs repo (https://github.com/dotnet/docs/issues/new/choose)

I'll look into this.

luisquintanilla commented 3 years ago

@caitaozhan the 0 you see there is detailing the formatting for floating point numbers. I this case, go out 4 decimal spaces.

When you ran the application did you get 0 as your output?

When I ran it, this is what I got

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1
=============== Create and Train the Model ===============
=============== End of training ===============

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
*************************************************
**********************************************************************
Predicted fare: 14.6362, actual fare: 15.5
**********************************************************************

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1\TaxiFarePrediction.exe (process 15440) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .
caitaozhan commented 3 years ago

@caitaozhan the 0 you see there is detailing the formatting for floating point numbers. I this case, go out 4 decimal spaces.

When you ran the application did you get 0 as your output?

When I ran it, this is what I got

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1
=============== Create and Train the Model ===============
=============== End of training ===============

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
*************************************************
**********************************************************************
Predicted fare: 14.6362, actual fare: 15.5
**********************************************************************

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1\TaxiFarePrediction.exe (process 15440) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

This is my output.

C:\Users\t-caitaozhan\source\repos\CSharpLearning\TaxiFarePrediction\bin\Debug\net5.0
=============== Create and Train the Model ===============
=============== End of training ===============

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
*************************************************
**********************************************************************
Predicted fare: 0, actual fare: 15.5
**********************************************************************
caitaozhan commented 3 years ago

@caitaozhan the 0 you see there is detailing the formatting for floating point numbers. I this case, go out 4 decimal spaces. When you ran the application did you get 0 as your output? When I ran it, this is what I got

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1
=============== Create and Train the Model ===============
=============== End of training ===============

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
*************************************************
**********************************************************************
Predicted fare: 14.6362, actual fare: 15.5
**********************************************************************

C:\Users\luquinta.REDMOND\Docs\samples\machine-learning\tutorials\TaxiFarePrediction\bin\Debug\netcoreapp3.1\TaxiFarePrediction.exe (process 15440) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

This is my output.

C:\Users\t-caitaozhan\source\repos\CSharpLearning\TaxiFarePrediction\bin\Debug\net5.0
=============== Create and Train the Model ===============
=============== End of training ===============

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
*************************************************
**********************************************************************
Predicted fare: 0, actual fare: 15.5
**********************************************************************

I tried both netcoreapp3.1 and net5.0. My ML.NET version is 1.5.5

luisquintanilla commented 3 years ago

Thanks for that. Hmmm.. I just went through the tutorial step by step using a brand new project and got an output.

*************************************************
*       Model quality metrics evaluation
*------------------------------------------------
*       RSquared Score:      0.89
*       Root Mean Squared Error:      3.3
**********************************************************************
Predicted fare: 14.6362, actual fare: 15.5
**********************************************************************

C:\Dev\MLADS-2021-mlnet-tutorial-live\Demo 2 - Tooling\TaxiFareProduction\bin\Debug\net5.0\TaxiFareProduction.exe (process 16008) exited with code 0.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.
Press any key to close this window . . .

Here's what my Program.cs file looks like:

using System;
using System.IO;
using Microsoft.ML;

namespace TaxiFareProduction
{
    class Program
    {
        static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
        static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
        static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

        static void Main(string[] args)
        {
            MLContext mlContext = new MLContext(seed: 0);

            var model = Train(mlContext, _trainDataPath);

            Evaluate(mlContext, model);

            TestSinglePrediction(mlContext, model);
        }

        public static ITransformer Train(MLContext mlContext, string dataPath)
        {
            IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(dataPath, hasHeader: true, separatorChar: ',');

            var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: "FareAmount")
                .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "VendorIdEncoded", inputColumnName: "VendorId"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "RateCodeEncoded", inputColumnName: "RateCode"))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "PaymentTypeEncoded", inputColumnName: "PaymentType"))
                .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PassengerCount", "TripDistance", "PaymentTypeEncoded"))
                .Append(mlContext.Regression.Trainers.FastTree());

            var model = pipeline.Fit(dataView);

            return model;
        }

        private static void Evaluate(MLContext mlContext, ITransformer model)
        {
            IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(_testDataPath, hasHeader: true, separatorChar: ',');

            var predictions = model.Transform(dataView);

            var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

            Console.WriteLine();
            Console.WriteLine($"*************************************************");
            Console.WriteLine($"*       Model quality metrics evaluation         ");
            Console.WriteLine($"*------------------------------------------------");

            Console.WriteLine($"*       RSquared Score:      {metrics.RSquared:0.##}");
            Console.WriteLine($"*       Root Mean Squared Error:      {metrics.RootMeanSquaredError:#.##}");
        }

        private static void TestSinglePrediction(MLContext mlContext, ITransformer model)
        {

            var predictionFunction = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(model);

            var taxiTripSample = new TaxiTrip()
            {
                VendorId = "VTS",
                RateCode = "1",
                PassengerCount = 1,
                TripTime = 1140,
                TripDistance = 3.75f,
                PaymentType = "CRD",
                FareAmount = 0 // To predict. Actual/Observed = 15.5
            };

            var prediction = predictionFunction.Predict(taxiTripSample);

            Console.WriteLine($"**********************************************************************");
            Console.WriteLine($"Predicted fare: {prediction.FareAmount:0.####}, actual fare: 15.5");
            Console.WriteLine($"**********************************************************************");
        }
    }
}
luisquintanilla commented 3 years ago

I'm using the latest version of ML.NET 1.6.0.

This is what my csproj file looks like

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <Folder Include="Data\" />
  </ItemGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.ML" Version="1.6.0" />
    <PackageReference Include="Microsoft.ML.FastTree" Version="1.6.0" />
  </ItemGroup>

  <ItemGroup>
    <None Update="Data\taxi-fare-test.csv">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </None>
    <None Update="Data\taxi-fare-train.csv">
      <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </None>
  </ItemGroup>

</Project>
luisquintanilla commented 3 years ago

@caitaozhan . Just tried netcoreapp3.1 & ML.NET 1.5.5 -> Same result. Can you please paste your csproj and Program.cs files. Thanks for your help with this.

caitaozhan commented 3 years ago

@caitaozhan . Just tried netcoreapp3.1 & ML.NET 1.5.5 -> Same result. Can you please paste your csproj and Program.cs files. Thanks for your help with this.

I found my bug:

public class TaxiTripFarePrediction
{
    [ColumnName("score")]    // lower case "score" caused the issue. It should start with an upper case "Score"
    public float FareAmount;
}

So why does lower/upper case matters? It is prone to errors. Thanks.

I see why:

var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

luisquintanilla commented 3 years ago

Great! Glad we were able to get to the bottom of it 😀. How you define it in the TaxiFarePrediction is how the rest of the application sees it. To help with that using something like nameof or maybe storing the value in a constant or something similar to prevent inconsistencies might be a good idea.

caitaozhan commented 3 years ago

Great! Glad we were able to get to the bottom of it 😀. How you define it in the TaxiFarePrediction is how the rest of the application sees it. To help with that using something like nameof or maybe storing the value in a constant or something similar to prevent inconsistencies might be a good idea.

Yup, Thanks.