dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

Mlnet CLI tool 16.2 generated model fails when being loaded to create a prediction engine #5580

Closed voges316 closed 3 years ago

voges316 commented 3 years ago

System information

Issue

Upgrading from the mlnet cli tool from v0.15.1 to v16.2 I am unable to import a generated model using the C# api that is generated using the most recent mlnet cli tool on a yelp sentiment dataset.

I get the following error

Unhandled exception. System.ArgumentOutOfRangeException: Could not find input column 'col1' (Parameter 'inputSchema')
   at Microsoft.ML.Data.OneToOneTransformerBase.CheckInput(DataViewSchema inputSchema, Int32 col, Int32& srcCol)
   at Microsoft.ML.Data.OneToOneTransformerBase.OneToOneMapperBase..ctor(IHost host, OneToOneTransformerBase parent, DataViewSchema inputSchema)
   at Microsoft.ML.Transforms.ValueToKeyMappingTransformer.Mapper..ctor(ValueToKeyMappingTransformer parent, DataViewSchema inputSchema)
   at Microsoft.ML.Transforms.ValueToKeyMappingTransformer.MakeRowMapper(DataViewSchema schema)
   at Microsoft.ML.Data.RowToRowTransformerBase.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.Data.TransformerChain`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.Data.TransformerChain`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.Data.TransformerChain`1.Microsoft.ML.ITransformer.GetRowToRowMapper(DataViewSchema inputSchema)
   at Microsoft.ML.PredictionEngineBase`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngineExtensions.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, IHostEnvironment env, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.ModelOperationsCatalog.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at MLConsoleApp.Program.Main(String[] args) in /home/devel/code/mlnet_example/MLConsoleApp/Program.cs:line 34

Source code / logs

Previous mlnet cli tool: Mlnet --version => 0.15.28007.4 Yelp labelled datset from here http://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip

Previous command to train a model on a dataset: mlnet auto-train --task binary-classification --dataset data/yelp_labelled.txt --label-column-index 1 --has-header false --max-exploration-time 30 --name YelpDemo

Now I can load that model into a console app and run it fine. Showing the ModelInput.cs generated by the mlnet cli tool. I can just create another input class like this and use it to load the model, create a prediction engine, and score a new input.

image

However, if I upgrade the mlnet cli tool it doesn't work mlnet –version => 16.2.0 Command to train a model: mlnet classification --dataset data/yelp_labelled.txt --has-header false --label-col 1 --train-time 30 --name YelpML16 The tool appears to pring a warning about a header being detected in the dataset, even though no header is used in this dataset. image

When I try and import the model and create a prediction engine I encounter the error I pasted above, saying: Unhandled exception. System.ArgumentOutOfRangeException: Could not find input column 'col1' (Parameter 'inputSchema')

Diving into the mlnet v16.2 classes, it appears the ModelInput.cs has changed. The second column is no longer a boolean Label, but a string Col1 image

This is different than the ModelInput.cs generated by v0.15.1, but if I change the modelinput class to match the mlnet cli output I still get an exception loading the model and creating the prediction engine.

Unhandled exception. System.InvalidOperationException: Can't bind the IDataView column 'PredictedLabel' of type 'String' to field or property 'Prediction' of type 'System.Boolean'.
   at Microsoft.ML.Data.TypedCursorable`1..ctor(IHostEnvironment env, IDataView data, Boolean ignoreMissingColumns, InternalSchemaDefinition schemaDefn)
   at Microsoft.ML.Data.TypedCursorable`1.Create(IHostEnvironment env, IDataView data, Boolean ignoreMissingColumns, SchemaDefinition schemaDefinition)
   at Microsoft.ML.PredictionEngineBase`2.PredictionEngineCore(IHostEnvironment env, InputRow`1 inputRow, IRowToRowMapper mapper, Boolean ignoreMissingColumns, SchemaDefinition outputSchemaDefinition, Action& disposer, IRowReadableAs`1& outputRow)
   at Microsoft.ML.PredictionEngineBase`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngine`2..ctor(IHostEnvironment env, ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.PredictionEngineExtensions.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, IHostEnvironment env, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.ModelOperationsCatalog.CreatePredictionEngine[TSrc,TDst](ITransformer transformer, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at MLConsoleApp.Program.Main(String[] args) in /home/devel/code/mlnet_example/MLConsoleApp/Program.cs:line 37
michaelgsharp commented 3 years ago

Hey @voges316 I am looking into this. I have not had a chance to directly test this myself, but I noticed that the 2 commands you run are different. On the old version you have --label-column-index 1 and the new version --label-col 1. Can you see what happens if you use the old command in the latest version and let me know?

voges316 commented 3 years ago

@michaelgsharp yes, the cli options changed from v0.15 to v16.x. Using the older --label-column-index using v16.2 prints out an error and the help menu.

Option '--label-col' is required.
Unrecognized command or argument '--label-column-index'
Unrecognized command or argument '1'
michaelgsharp commented 3 years ago

@voges316 this is interesting. With version 16.2.0 I am also getting the same warning about the header, and label is also a string for me as well, but it runs without errors when I run it. I have attached the solution generated when I run it. Can you test it on your machine and see what happens? If it works, can you compare the differences between my project and yours? In theory they should be the same since its with the same version, but something is obviously different. YelpML16.zip

voges316 commented 3 years ago

@michaelgsharp So I tried to import the model in the existing code, and it failed to even import.

# Import line
ITransformer predictionPipeline = mlContext.Model.Load("models/YelpML16.zip", out predictionPipelineSchema);

# Exception
Unhandled exception. System.InvalidOperationException: Could not load legacy format model
 ---> System.InvalidOperationException: Repository doesn't contain entry DataLoaderModel/Model.key

Here's the full exception if it helps image

I also created a repo just to show what I was doing: https://github.com/voges316/MlnetModelImports

michaelgsharp commented 3 years ago

And you tried to import that using version 16.2.0 correct?

voges316 commented 3 years ago

Yes. mlnet v16.2.0 is what was used to generate the model. I am trying to import the model in dotnet-core, using Microsoft.ML v1.5.4

michaelgsharp commented 3 years ago

So using your exact same project (after modying the input/output to match the 16.2.0 version) it loaded and ran the model fine. I am wondering if there are multiple version of ML.NET on your machine that are conflicting with each other somehow. Can you clear our your nuget package folder and then rebuild and test again?

voges316 commented 3 years ago

So I deleted .nuget/packages/*, rebooted too for good measure, ran dotnet restore and dotnet run and encountered the same error.

Unhandled exception. System.InvalidOperationException: Can't bind the IDataView column 'PredictedLabel' of type 'String' to field or property 'Prediction' of type 'System.Boolean'.
michaelgsharp commented 3 years ago

I actually think thats progress. I had to change the ModelOutput to this:

    public class ModelOutput
    {
        //[ColumnName("PredictedLabel")]
        //public bool Prediction { get; set; }
        //public float Probability { get; set; }
        //public float Score { get; set; }

        // This is what 16.2.0 gives me
        [ColumnName("PredictedLabel")]
        public String Prediction { get; set; }
        // Note, the first value in the Score array corresponds to the prediction
        public float[] Score { get; set; }
    }

That should get you past the current error you are seeing.

You should then see an error on this line:

Console.WriteLine($"Text: {input.Col0} | Prediction: {(Convert.ToBoolean(result.Prediction) ? "Positive" : "Negative")} review | Probability of being positive: {result.Score[0]} ");

result.Prediction is coming back as "0", and the call to Convert.ToBoolean isn't happy with that. I changed it to this:

Console.WriteLine($"Text: {input.Col0} | Prediction: {(result.Prediction == "0" ? "Negative" : "Positive" )} review | Probability: {result.Score[0]} ");

Try that out and let me know what happens.

voges316 commented 3 years ago

Well that does indeed appear to fix it. So the actual modelinput & modeloutput generated by the mlnet cli tool differed between v0.15.1 and v16.2. That does make sense, but why is it saying columns that are 0/1 are 'strings' instead of booleans? But maybe that's just a code generation issue. Thanks for the help.

# dotnet run

Hello World!
=============== Single Prediction  ===============
Text: Meh, food was cold. | Prediction: Negative review | Probability of being positive: 0.00078324333 
================End of Process.Hit any key to exit==================================
michaelgsharp commented 3 years ago

It says they are strings because it parsed the input label column as a string, so the output label column is as well. I am still looking into why it is doing that, and why it thinks there is a header. Its possible that when those issues are resolved that it will go back to being a boolean value.

Sine your immediate issue has been resolved, I am going to go ahead and close this issue for now. I will shortly create issues to track the header and boolean issue. If you have more issues please feel free to either re-open this ticket (if its the same issue) or create a new one as needed.