dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9k stars 1.88k forks source link

Not being able to generate code from model on ML.NET CLI #5303

Closed jualbina closed 4 years ago

jualbina commented 4 years ago

System information

Issue

Source code / logs

The library generated a log file, but since I was running over a dynamically generated Azure Machine Learning compute cluster, I was not able to retrieve it.

frank-dong-ms-zz commented 4 years ago

@jualbina could you please share the model file so we can investigate further? By the way, how was this model generated?

jualbina commented 4 years ago

@frank-dong-ms I've attached the model zip file. It has been created using ML.NET CLI via the command TrainTest.

[model-mlnet.zip](https://github.com/dotnet/machinelearning/files/4893592/model-mlnet.zip)

justinormont commented 4 years ago

Transferred back to the ML․NET repo as the MAML CLI used here is distinct from the AutoML MLNET CLI.

justinormont commented 4 years ago

Background

Looking at the model, this is trained w/ MAML command: maml.exe TrainTest test=inputs/test.tsv tr=LightGBMBinary{iter=100} scorer=BinaryClassifierScorer eval=BinaryClassifierEvaluator norm=No cache=+ dout=outputs/pred.tsv loader=TextLoader{col=Features:R4:5-47,50-142 col=Label:R4:48} data=inputs/train.tsv out=outputs/model.zip seed=1

This is specified in the model zip at ./TrainingInfo/Command.txt.

Version is: 1.5.29002.0 @BuiltBy: root-bb5c9540127b @Branch: master @SrcCode: https://github.com/dotnet/machinelearning/tree/1ea2b470d6e04c8224823ee72c81d9a3f71039cf+1ea2b470d6e04c8224823ee72c81d9a3f71039cf (commit), as specified in the model zip at ./TrainingInfo/Version.txt.

Internally to Microsoft, MAML is created as an executable called maml.exe, which is distinct from the AutoML mlnet CLI (info).

In the public ML․NET repo, one can build the repo, then call the MAML CLI as: dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll ... or dotnet ./bin/AnyCPU.Debug/Microsoft.ML.Console/netcoreapp2.1/MML.dll ... (if using a debug build)

For instance the help for SaveModel is displayed as:

$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll ? SaveModel

Help for Command: 'SavePredictorAs'
  Aliases: SavePredictor, SaveAs, SaveModel

Summary:
   Given a TLC model file with a predictor, we can output this same predictor in multiple export formats.

inputModelFile=<string>  Model file containing the predictor (short form in)
summaryFile=<string>     File to save model summary (short form sum)
textFile=<string>        File to save in text format (short form text)
iniFile=<string>         File to save in INI format (short form ini)
codeFile=<string>        File to save in C++ code (short form code)
binaryFile=<string>      File to save in binary format (short form bin)

Issue

@jualbina is using the maml.exe command named SaveModel, which converts the model's trainer to raw C++ code for the purpose of fast predictions or to integrate directly within a C++ application without ONNX or ML․NET dependencies.

Repro

I was able to both reproduce the error, and show a working method.

Reproducing the error

The Error during class instantiation is a model loading issue. The model was trained using a ML․NET version taken on 2020-06-02 (commit).

On a hunch, I tried running the command on an earlier version of ML․NET.

Using a commit from a couple of months earlier than the model, and running the same command as @jualbina on the model, I get the same error:

$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp

Error during class instantiation
One of the identified items was in an invalid format.
Error log has been saved to '/var/folders/0j/nthv5ntn75v45r3d7_6zl_lr0000gr/T/TLC/Error_20200711_083403_59d52a97-a11b-4f0c-a2ab-e8de442e3a51.log'. Please refer to https://aka.ms/MLNetIssue if you need assistance.

Contents of the log:

--- Command line args ---
saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp
--- Exception message ---
(1) Unexpected exception: Error during class instantiation, 'System.InvalidOperationException'
   at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 254
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 1028
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes,TSig](IHostEnvironment env, TRes& result, String name, String options, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 993
   at Microsoft.ML.ModelLoadContext.TryLoadModelCore[TRes,TSig](IHostEnvironment env, TRes& result, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 252
   at Microsoft.ML.ModelLoadContext.TryLoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 164
   at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 180
   at Microsoft.ML.Model.ModelFileUtils.LoadPipeline(IHostEnvironment env, RepositoryReader rep, IMultiStreamSource files, Boolean extractInnerPipe) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Utilities/ModelFileUtils.cs:line 83
   at Microsoft.ML.Tools.SavePredictorUtils.LoadModel(IHostEnvironment env, Stream modelStream, Boolean loadNames, IPredictor& predictor, RoleMappedSchema& schema) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 222
   at Microsoft.ML.Tools.SavePredictorUtils.SavePredictor(IHostEnvironment env, Stream modelStream, Stream binaryModelStream, Stream summaryModelStream, Stream textModelStream, Stream iniModelStream, Stream codeModelStream) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 134
   at Microsoft.ML.Tools.SavePredictorCommand.Run() in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 93
   at Microsoft.ML.Tools.Maml.MainCore(IHostEnvironment env, String args, Boolean alwaysPrintStacktrace) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Maml/MAML.cs:line 142
(2) Unexpected exception: Exception has been thrown by the target of an invocation., 'System.Reflection.TargetInvocationException'
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 239
(3) Unexpected exception: One of the identified items was in an invalid format., 'System.FormatException'
   at Microsoft.ML.Data.TextLoader.Bindings..ctor(ModelLoadContext ctx, TextLoader parent) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 905
   at Microsoft.ML.Data.TextLoader..ctor(IHost host, ModelLoadContext ctx) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1385
   at Microsoft.ML.Data.TextLoader.<>c__DisplayClass31_0.<Create>b__0(IChannel ch) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1397
   at Microsoft.ML.Runtime.HostExtensions.Apply[T](IHost host, String channelName, Func`2 func) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/IHostEnvironment.cs:line 247
   at Microsoft.ML.Data.TextLoader.Create(IHostEnvironment env, ModelLoadContext ctx) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1397
   at Microsoft.ML.Data.TextLoader.Create(IHostEnvironment env, ModelLoadContext ctx, IMultiStreamSource files) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1402

This demonstrates an older version of ML․NET will throw the same error when converting this model trained using a newer version of ML․NET. Given the same error message is produced, the use of an old version may be the cause.

Working method

Re-running with the current ML․NET (commit), I am able to load the model and run the SaveModel command:

$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp

Saving predictor as code
$ ls -l /tmp/out.cpp

-rw-r--r--  1 justinormont  wheel  125005 Jul 11 02:54 /tmp/out.cpp

$ cat /tmp/out.cpp

double treeOutput0=((f0 > 1E-35) ? ((f17 > 1E-35) ? ((f6 > 2.5) ? ((f15 > 1E-35) ? ((f6 > 5.5) ? -4.1431197758963716 : -3.6323201061697352) : -3.1958282300638658) : ((f44 > 0.195000008) ? ((f6 > 1.5) ? -2.6289826443678317 : -2.0098592950589507) : -2.8591690932041023)) : ((f16 > 1E-35) ? ((f44 > 0.145000011) ? ((f4 > 1E-35) ? -4.6961065826682935 : -3.0368167816891827) : -4.3404531947136045) : -4.7531186141770334)) : ((f44 > 0.135) ? ((f12 > 1E-35) ? ((f6 > 2.5) ? -4.1934780759940891 : ((f5 > 1E-35) ? -4.5853889286978635 : ((f14 > 1E-35) ? -4.5206674460948273 : ((f6 > 1.5) ? -3.4696550769387744 : ((f4 > 1E-35) ? -4.3401247613727048 : ((f44 > 0.265000015) ? -2.0773968928478768 : -2.8233083759993329)))))) : ((f41 > 1E-35) ? ((f44 > 0.295000017) ? -3.2215960321281374 : ((f37 > 1E-35) ? ((f6 > 3.5) ? -4.3197890275608941 : -3.367959889935114) : ((f38 > 0.665) ? -4.1519666532000175 : -4.7340575735217492))) : ((f16 > 1E-35) ? -4.2270974495324127 : -4.798200392723003))) : ((f44 > 0.055) ? ((f12 > 1E-35) ? ((f6 > 3.5) ? -4.5916831229203723 : ((f5 > 1E-35) ? -4.7472526313434225 : -3.9851756258762157)) : ((f37 > 1E-35) ? -4.3758418858919859 : -4.7682315811364893)) : -4.8564259204475553)));
double treeOutput1=((f44 > 0.0950000063) ? ((f0 > 1E-35) ? ((f15 > 1E-35) ? ((f17 > 1E-35) ? 0.51876041347717483 : ((f16 > 1E-35) ? 0.23844915627562097 : -0.52659607248340057)) : 0.76980092847536885) : ((f29 > 1E-35) ? -0.5020202132028635 : ((f44 > 0.215) ? ((f5 > 1E-35) ? -0.24098529859197132 : ((f16 > 1E-35) ? ((f4 > 1E-35) ? -0.10137776731058891 : ((f6 > 5.5) ? 0.1451586409551722 : 0.8463515356275334)) : ((f17 > 1E-35) ? ((f6 > 2.5) ? 0.12749536323616453 : 0.68698006196180861) : ((f59 > 0.245) ? -0.11873463129059679 : 0.86305787002510304)))) : ((f5 > 1E-35) ? -0.36133039550442869 : ((f16 > 1E-35) ? ((f4 > 1E-35) ? -0.34410402921705446 : ((f6 > 4.5) ? -0.048935219217333896 : 0.50127621522278965)) : ((f17 > 1E-35) ? ((f6 > 1.5) ? ((f6 > 4.5) ? -0.19454183399514344 : 0.13123999935338701) : 0.75782433708593122) : -0.32936302678924734)))))) : ((f17 > 1E-35) ? ((f6 > 2.5) ? ((f0 > 1E-35) ? 0.12562685499046936 : -0.29842738605892488) : 0.36845134376063815) : ((f44 > 0.035) ? ((f16 > 1E-35) ? ((f5 > 1E-35) ? -0.43778105122047478 : ((f4 > 1E-35) ? -0.44638891439384332 : ((f6 > 4.5) ? -0.23597893791353697 : 0.21150849365673696))) : -0.476604654235808) : -0.49901110336675775)));

... 96 trees removed for terseness ...

double treeOutput98=((f49 > 1E-35) ? ((f49 > 0.195000008) ? -0.00022738581909925596 : ((f133 > 0.825) ? -0.20692933604661623 : ((f40 > 0.625) ? ((f133 > 1E-35) ? ((f17 > 1E-35) ? 0.089141794844577016 : ((f54 > 440.5) ? 0.82955987384491381 : 0.23208197986597387)) : ((f92 > 0.585) ? ((f98 > 3.505) ? 3.9131097311894059 : ((f59 > 1E-35) ? -0.12932198048333635 : ((f43 > 1441.5) ? 1.2647107042721135 : 0.12893076124699587))) : 0.15633101295636773)) : ((f48 > 0.335) ? -0.055700624379604977 : ((f38 > 0.425) ? ((f44 > 1E-35) ? ((f17 > 1E-35) ? -0.014964896356048392 : ((f55 > 0.915) ? 1.2596553527026073 : 0.13601358019970178)) : 0.28723981058404496) : 0.0032784726663138272))))) : ((f98 > 10.005) ? ((f24 > 1E-35) ? ((f119 > 0.085) ? -66.897916260618132 : -0.47176219379920842) : ((f4 > 1E-35) ? ((f98 > 10.205) ? -0.4917224738651696 : -61.654792517461935) : ((f17 > 1E-35) ? -0.77446400051294539 : ((f117 > 0.225000009) ? ((f38 > 0.945) ? 0.17514591682039302 : -126.90353223022542) : ((f98 > 10.205) ? -0.20219596910683429 : ((f38 > 0.945) ? -0.014808024296460189 : -93.088923831513839)))))) : ((f43 > 1E-35) ? ((f0 > 1E-35) ? 0.016995799605286254 : ((f13 > 1E-35) ? 0.23834903159346549 : -0.25445984471866523)) : 0.015090778080967775)));
double treeOutput99=((f44 > 0.105000004) ? ((f67 > 0.065) ? ((f49 > 0.655000031) ? 0.001039636731379961 : 0.014692256841068983) : ((f15 > 1E-35) ? -0.13035262122122973 : -0.014448856495556597)) : ((f52 > 0.125) ? ((f57 > 3.005) ? -0.012693473426511207 : ((f59 > 0.565) ? -0.20600958842103584 : ((f17 > 1E-35) ? -0.014763031175560904 : -0.083229482736673785))) : ((f6 > 14.5) ? ((f44 > 0.035) ? ((f57 > 9.865) ? 0.096799475886641409 : -0.02130705234139877) : -0.10826702284820437) : ((f98 > 9.225) ? 0.020569186314723247 : ((f110 > 0.175000012) ? ((f95 > 15.5) ? ((f17 > 1E-35) ? ((f44 > 0.025) ? -0.070712405134995951 : -0.22358255660176043) : -0.0097888396023686052) : -0.007415560000333987) : ((f4 > 1E-35) ? ((f44 > 0.035) ? 0.021272209566017521 : -0.13365597989740136) : ((f44 > 0.0950000063) ? -0.018407071409150662 : ((f5 > 1E-35) ? ((f0 > 1E-35) ? ((f14 > 1E-35) ? -0.47545369183635644 : 0.34057166745111389) : ((f44 > 0.045) ? ((f59 > 0.375) ? -0.042652325967856064 : 0.054950352706337045) : ((f56 > 0.335) ? -0.019974417187712246 : -0.11874998404893158))) : ((f43 > 6252.5) ? ((f16 > 1E-35) ? ((f66 > 0.065) ? 0.43932366063823791 : 0.12451309595592253) : ((f59 > 0.325000018) ? 0.06508584693583476 : -0.070883674944735769)) : 0.0052035291046983647)))))))));
double output = treeOutput0+treeOutput1+treeOutput2+treeOutput3+treeOutput4+treeOutput5+treeOutput6+treeOutput7+treeOutput8+treeOutput9+treeOutput10+treeOutput11+treeOutput12+treeOutput13+treeOutput14+treeOutput15+treeOutput16+treeOutput17+treeOutput18+treeOutput19+treeOutput20+treeOutput21+treeOutput22+treeOutput23+treeOutput24+treeOutput25+treeOutput26+treeOutput27+treeOutput28+treeOutput29+treeOutput30+treeOutput31+treeOutput32+treeOutput33+treeOutput34+treeOutput35+treeOutput36+treeOutput37+treeOutput38+treeOutput39+treeOutput40+treeOutput41+treeOutput42+treeOutput43+treeOutput44+treeOutput45+treeOutput46+treeOutput47+treeOutput48+treeOutput49+treeOutput50+treeOutput51+treeOutput52+treeOutput53+treeOutput54+treeOutput55+treeOutput56+treeOutput57+treeOutput58+treeOutput59+treeOutput60+treeOutput61+treeOutput62+treeOutput63+treeOutput64+treeOutput65+treeOutput66+treeOutput67+treeOutput68+treeOutput69+treeOutput70+treeOutput71+treeOutput72+treeOutput73+treeOutput74+treeOutput75+treeOutput76+treeOutput77+treeOutput78+treeOutput79+treeOutput80+treeOutput81+treeOutput82+treeOutput83+treeOutput84+treeOutput85+treeOutput86+treeOutput87+treeOutput88+treeOutput89+treeOutput90+treeOutput91+treeOutput92+treeOutput93+treeOutput94+treeOutput95+treeOutput96+treeOutput97+treeOutput98+treeOutput99;

As shown above, the LightGBMBinary is successfully converted to C++ code. For anyone reading it, each tree is represented as nested ternary operators, then the output from each tree is added together on the last line to get the final predicted score.

Cause

@jualbina : Are you using an older build of ML․NET for converting your model to C++ than you trained with? That would match my repro results.

When converting your model using the current master, I can convert successfully. I would try updating from master, building (./build.sh -Release), and re-running the MAML SaveModel command.

The older version which I tried can't load this model. This is likely a change in the model format, for instance we added a couple of bits to the model to denote new options in the TextLoader to support features like multi-line. This may have caused the older versions to no longer load the newer models. While backwards compatibility is, forward compatibility is not, guaranteed.

@harishsk et al.: Is there a way to throw a more useful error if a user loads a newer model than is supported by the current ML․NET version? Perhaps a warning, "Model was trained on a newer version of ML․NET. Forward compatibility is not guaranteed. Please update your ML.NET version", anytime a newer model is detected.

frank-dong-ms-zz commented 4 years ago

@jualbina please let us know if Justin's suggestion work for you, thanks. By the mean time, we will try to improve the error message.

frank-dong-ms-zz commented 4 years ago

This issue means user is trying to converting a model (created with latest code) using old version code. As Justin commented we support backwards compatibility but forward compatibility is not guaranteed. Create #5311 to follow up on the error message improvement work. Close this issue.