Open nsingal opened 4 years ago
+1 One of the basic reasons to train a simple linear model (such as logistic regression) with n-gram features for a text classification problem is that it allows for rapid feature weight inspection and debugging. In Ml.Net it seems impossible to programmatically access a fully specified feature weight table (feature name -> feature weight) for text ngram features in an LR model, so there should at least be a way to save the model as a text file for inspection.
@harishsk: I think @nsingal is discussing the maml.exe SaveModel in=model.zip textFile=out.txt
command of TLC/MAML. Within the TLC GUI, this is enabled with a checkbox on the left reading "Save Model as Text".
I left more information about the SaveModel
command in https://github.com/dotnet/machinelearning/issues/5303#issuecomment-657040153:
@justinormont commented on 2020-07-11
Background
Looking at the model, this is trained w/ MAML command:
maml.exe TrainTest test=inputs/test.tsv tr=LightGBMBinary{iter=100} scorer=BinaryClassifierScorer eval=BinaryClassifierEvaluator norm=No cache=+ dout=outputs/pred.tsv loader=TextLoader{col=Features:R4:5-47,50-142 col=Label:R4:48} data=inputs/train.tsv out=outputs/model.zip seed=1
This is specified in the model zip at
./TrainingInfo/Command.txt
.Version is:
1.5.29002.0 @BuiltBy: root-bb5c9540127b @Branch: master @SrcCode: https://github.com/dotnet/machinelearning/tree/1ea2b470d6e04c8224823ee72c81d9a3f71039cf+1ea2b470d6e04c8224823ee72c81d9a3f71039cf
(commit), as specified in the model zip at./TrainingInfo/Version.txt
.Internally to Microsoft, MAML is created as an executable called
maml.exe
, which is distinct from the AutoMLmlnet
CLI (info).In the public ML․NET repo, one can build the repo, then call the MAML CLI as:
dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll ...
ordotnet ./bin/AnyCPU.Debug/Microsoft.ML.Console/netcoreapp2.1/MML.dll ...
(if using a debug build)For instance the help for
SaveModel
is displayed as:$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll ? SaveModel Help for Command: 'SavePredictorAs' Aliases: SavePredictor, SaveAs, SaveModel Summary: Given a TLC model file with a predictor, we can output this same predictor in multiple export formats. inputModelFile=<string> Model file containing the predictor (short form in) summaryFile=<string> File to save model summary (short form sum) textFile=<string> File to save in text format (short form text) iniFile=<string> File to save in INI format (short form ini) codeFile=<string> File to save in C++ code (short form code) binaryFile=<string> File to save in binary format (short form bin)
Issue
@jualbina is using the
maml.exe
command namedSaveModel
, which converts the model's trainer to raw C++ code for the purpose of fast predictions or to integrate directly within a C++ application without ONNX or ML․NET dependencies.Repro
I was able to both reproduce the error, and show a working method.
Reproducing the error
The
Error during class instantiation
is a model loading issue. The model was trained using a ML․NET version taken on 2020-06-02 (commit).On a hunch, I tried running the command on an earlier version of ML․NET.
Using a commit from a couple of months earlier than the model, and running the same command as @jualbina on the model, I get the same error:
$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp Error during class instantiation One of the identified items was in an invalid format. Error log has been saved to '/var/folders/0j/nthv5ntn75v45r3d7_6zl_lr0000gr/T/TLC/Error_20200711_083403_59d52a97-a11b-4f0c-a2ab-e8de442e3a51.log'. Please refer to https://aka.ms/MLNetIssue if you need assistance.
Contents of the log:
--- Command line args --- saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp --- Exception message --- (1) Unexpected exception: Error during class instantiation, 'System.InvalidOperationException' at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 254 at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 1028 at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes,TSig](IHostEnvironment env, TRes& result, String name, String options, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 993 at Microsoft.ML.ModelLoadContext.TryLoadModelCore[TRes,TSig](IHostEnvironment env, TRes& result, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 252 at Microsoft.ML.ModelLoadContext.TryLoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 164 at Microsoft.ML.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/ModelLoading.cs:line 180 at Microsoft.ML.Model.ModelFileUtils.LoadPipeline(IHostEnvironment env, RepositoryReader rep, IMultiStreamSource files, Boolean extractInnerPipe) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Utilities/ModelFileUtils.cs:line 83 at Microsoft.ML.Tools.SavePredictorUtils.LoadModel(IHostEnvironment env, Stream modelStream, Boolean loadNames, IPredictor& predictor, RoleMappedSchema& schema) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 222 at Microsoft.ML.Tools.SavePredictorUtils.SavePredictor(IHostEnvironment env, Stream modelStream, Stream binaryModelStream, Stream summaryModelStream, Stream textModelStream, Stream iniModelStream, Stream codeModelStream) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 134 at Microsoft.ML.Tools.SavePredictorCommand.Run() in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/Commands/SavePredictorCommand.cs:line 93 at Microsoft.ML.Tools.Maml.MainCore(IHostEnvironment env, String args, Boolean alwaysPrintStacktrace) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Maml/MAML.cs:line 142 (2) Unexpected exception: Exception has been thrown by the target of an invocation., 'System.Reflection.TargetInvocationException' at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/ComponentModel/ComponentCatalog.cs:line 239 (3) Unexpected exception: One of the identified items was in an invalid format., 'System.FormatException' at Microsoft.ML.Data.TextLoader.Bindings..ctor(ModelLoadContext ctx, TextLoader parent) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 905 at Microsoft.ML.Data.TextLoader..ctor(IHost host, ModelLoadContext ctx) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1385 at Microsoft.ML.Data.TextLoader.<>c__DisplayClass31_0.<Create>b__0(IChannel ch) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1397 at Microsoft.ML.Runtime.HostExtensions.Apply[T](IHost host, String channelName, Func`2 func) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Core/Data/IHostEnvironment.cs:line 247 at Microsoft.ML.Data.TextLoader.Create(IHostEnvironment env, ModelLoadContext ctx) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1397 at Microsoft.ML.Data.TextLoader.Create(IHostEnvironment env, ModelLoadContext ctx, IMultiStreamSource files) in /Users/justinormont/Documents/src/machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs:line 1402
This demonstrates an older version of ML․NET will throw the same error when converting this model trained using a newer version of ML․NET. Given the same error message is produced, the use of an old version may be the cause.
Working method
Re-running with the current ML․NET (commit), I am able to load the model and run the
SaveModel
command:$ dotnet ./bin/AnyCPU.Release/Microsoft.ML.Console/netcoreapp2.1/MML.dll saveModel in=/Users/justinormont/Downloads/model-mlnet.zip code=/tmp/out.cpp Saving predictor as code $ ls -l /tmp/out.cpp -rw-r--r-- 1 justinormont wheel 125005 Jul 11 02:54 /tmp/out.cpp $ cat /tmp/out.cpp double treeOutput0=((f0 > 1E-35) ? ((f17 > 1E-35) ? ((f6 > 2.5) ? ((f15 > 1E-35) ? ((f6 > 5.5) ? -4.1431197758963716 : -3.6323201061697352) : -3.1958282300638658) : ((f44 > 0.195000008) ? ((f6 > 1.5) ? -2.6289826443678317 : -2.0098592950589507) : -2.8591690932041023)) : ((f16 > 1E-35) ? ((f44 > 0.145000011) ? ((f4 > 1E-35) ? -4.6961065826682935 : -3.0368167816891827) : -4.3404531947136045) : -4.7531186141770334)) : ((f44 > 0.135) ? ((f12 > 1E-35) ? ((f6 > 2.5) ? -4.1934780759940891 : ((f5 > 1E-35) ? -4.5853889286978635 : ((f14 > 1E-35) ? -4.5206674460948273 : ((f6 > 1.5) ? -3.4696550769387744 : ((f4 > 1E-35) ? -4.3401247613727048 : ((f44 > 0.265000015) ? -2.0773968928478768 : -2.8233083759993329)))))) : ((f41 > 1E-35) ? ((f44 > 0.295000017) ? -3.2215960321281374 : ((f37 > 1E-35) ? ((f6 > 3.5) ? -4.3197890275608941 : -3.367959889935114) : ((f38 > 0.665) ? -4.1519666532000175 : -4.7340575735217492))) : ((f16 > 1E-35) ? -4.2270974495324127 : -4.798200392723003))) : ((f44 > 0.055) ? ((f12 > 1E-35) ? ((f6 > 3.5) ? -4.5916831229203723 : ((f5 > 1E-35) ? -4.7472526313434225 : -3.9851756258762157)) : ((f37 > 1E-35) ? -4.3758418858919859 : -4.7682315811364893)) : -4.8564259204475553))); double treeOutput1=((f44 > 0.0950000063) ? ((f0 > 1E-35) ? ((f15 > 1E-35) ? ((f17 > 1E-35) ? 0.51876041347717483 : ((f16 > 1E-35) ? 0.23844915627562097 : -0.52659607248340057)) : 0.76980092847536885) : ((f29 > 1E-35) ? -0.5020202132028635 : ((f44 > 0.215) ? ((f5 > 1E-35) ? -0.24098529859197132 : ((f16 > 1E-35) ? ((f4 > 1E-35) ? -0.10137776731058891 : ((f6 > 5.5) ? 0.1451586409551722 : 0.8463515356275334)) : ((f17 > 1E-35) ? ((f6 > 2.5) ? 0.12749536323616453 : 0.68698006196180861) : ((f59 > 0.245) ? -0.11873463129059679 : 0.86305787002510304)))) : ((f5 > 1E-35) ? -0.36133039550442869 : ((f16 > 1E-35) ? ((f4 > 1E-35) ? -0.34410402921705446 : ((f6 > 4.5) ? -0.048935219217333896 : 0.50127621522278965)) : ((f17 > 1E-35) ? ((f6 > 1.5) ? ((f6 > 4.5) ? -0.19454183399514344 : 0.13123999935338701) : 0.75782433708593122) : -0.32936302678924734)))))) : ((f17 > 1E-35) ? ((f6 > 2.5) ? ((f0 > 1E-35) ? 0.12562685499046936 : -0.29842738605892488) : 0.36845134376063815) : ((f44 > 0.035) ? ((f16 > 1E-35) ? ((f5 > 1E-35) ? -0.43778105122047478 : ((f4 > 1E-35) ? -0.44638891439384332 : ((f6 > 4.5) ? -0.23597893791353697 : 0.21150849365673696))) : -0.476604654235808) : -0.49901110336675775))); ... 96 trees removed for terseness ... double treeOutput98=((f49 > 1E-35) ? ((f49 > 0.195000008) ? -0.00022738581909925596 : ((f133 > 0.825) ? -0.20692933604661623 : ((f40 > 0.625) ? ((f133 > 1E-35) ? ((f17 > 1E-35) ? 0.089141794844577016 : ((f54 > 440.5) ? 0.82955987384491381 : 0.23208197986597387)) : ((f92 > 0.585) ? ((f98 > 3.505) ? 3.9131097311894059 : ((f59 > 1E-35) ? -0.12932198048333635 : ((f43 > 1441.5) ? 1.2647107042721135 : 0.12893076124699587))) : 0.15633101295636773)) : ((f48 > 0.335) ? -0.055700624379604977 : ((f38 > 0.425) ? ((f44 > 1E-35) ? ((f17 > 1E-35) ? -0.014964896356048392 : ((f55 > 0.915) ? 1.2596553527026073 : 0.13601358019970178)) : 0.28723981058404496) : 0.0032784726663138272))))) : ((f98 > 10.005) ? ((f24 > 1E-35) ? ((f119 > 0.085) ? -66.897916260618132 : -0.47176219379920842) : ((f4 > 1E-35) ? ((f98 > 10.205) ? -0.4917224738651696 : -61.654792517461935) : ((f17 > 1E-35) ? -0.77446400051294539 : ((f117 > 0.225000009) ? ((f38 > 0.945) ? 0.17514591682039302 : -126.90353223022542) : ((f98 > 10.205) ? -0.20219596910683429 : ((f38 > 0.945) ? -0.014808024296460189 : -93.088923831513839)))))) : ((f43 > 1E-35) ? ((f0 > 1E-35) ? 0.016995799605286254 : ((f13 > 1E-35) ? 0.23834903159346549 : -0.25445984471866523)) : 0.015090778080967775))); double treeOutput99=((f44 > 0.105000004) ? ((f67 > 0.065) ? ((f49 > 0.655000031) ? 0.001039636731379961 : 0.014692256841068983) : ((f15 > 1E-35) ? -0.13035262122122973 : -0.014448856495556597)) : ((f52 > 0.125) ? ((f57 > 3.005) ? -0.012693473426511207 : ((f59 > 0.565) ? -0.20600958842103584 : ((f17 > 1E-35) ? -0.014763031175560904 : -0.083229482736673785))) : ((f6 > 14.5) ? ((f44 > 0.035) ? ((f57 > 9.865) ? 0.096799475886641409 : -0.02130705234139877) : -0.10826702284820437) : ((f98 > 9.225) ? 0.020569186314723247 : ((f110 > 0.175000012) ? ((f95 > 15.5) ? ((f17 > 1E-35) ? ((f44 > 0.025) ? -0.070712405134995951 : -0.22358255660176043) : -0.0097888396023686052) : -0.007415560000333987) : ((f4 > 1E-35) ? ((f44 > 0.035) ? 0.021272209566017521 : -0.13365597989740136) : ((f44 > 0.0950000063) ? -0.018407071409150662 : ((f5 > 1E-35) ? ((f0 > 1E-35) ? ((f14 > 1E-35) ? -0.47545369183635644 : 0.34057166745111389) : ((f44 > 0.045) ? ((f59 > 0.375) ? -0.042652325967856064 : 0.054950352706337045) : ((f56 > 0.335) ? -0.019974417187712246 : -0.11874998404893158))) : ((f43 > 6252.5) ? ((f16 > 1E-35) ? ((f66 > 0.065) ? 0.43932366063823791 : 0.12451309595592253) : ((f59 > 0.325000018) ? 0.06508584693583476 : -0.070883674944735769)) : 0.0052035291046983647))))))))); double output = treeOutput0+treeOutput1+treeOutput2+treeOutput3+treeOutput4+treeOutput5+treeOutput6+treeOutput7+treeOutput8+treeOutput9+treeOutput10+treeOutput11+treeOutput12+treeOutput13+treeOutput14+treeOutput15+treeOutput16+treeOutput17+treeOutput18+treeOutput19+treeOutput20+treeOutput21+treeOutput22+treeOutput23+treeOutput24+treeOutput25+treeOutput26+treeOutput27+treeOutput28+treeOutput29+treeOutput30+treeOutput31+treeOutput32+treeOutput33+treeOutput34+treeOutput35+treeOutput36+treeOutput37+treeOutput38+treeOutput39+treeOutput40+treeOutput41+treeOutput42+treeOutput43+treeOutput44+treeOutput45+treeOutput46+treeOutput47+treeOutput48+treeOutput49+treeOutput50+treeOutput51+treeOutput52+treeOutput53+treeOutput54+treeOutput55+treeOutput56+treeOutput57+treeOutput58+treeOutput59+treeOutput60+treeOutput61+treeOutput62+treeOutput63+treeOutput64+treeOutput65+treeOutput66+treeOutput67+treeOutput68+treeOutput69+treeOutput70+treeOutput71+treeOutput72+treeOutput73+treeOutput74+treeOutput75+treeOutput76+treeOutput77+treeOutput78+treeOutput79+treeOutput80+treeOutput81+treeOutput82+treeOutput83+treeOutput84+treeOutput85+treeOutput86+treeOutput87+treeOutput88+treeOutput89+treeOutput90+treeOutput91+treeOutput92+treeOutput93+treeOutput94+treeOutput95+treeOutput96+treeOutput97+treeOutput98+treeOutput99;
As shown above, the
LightGBMBinary
is successfully converted to C++ code. For anyone reading it, each tree is represented as nested ternary operators, then the output from each tree is added together on the last line to get the final predicted score.Cause
@jualbina : Are you using an older build of ML․NET for converting your model to C++ than you trained with? That would match my repro results.
When converting your model using the current master, I can convert successfully. I would try updating from master, building (
./build.sh -Release
), and re-running the MAMLSaveModel
command.The older version which I tried can't load this model. This is likely a change in the model format, for instance we added a couple of bits to the model to denote new options in the TextLoader to support features like multi-line. This may have caused the older versions to no longer load the newer models. While backwards compatibility is, forward compatibility is not, guaranteed.
@harishsk et al.: Is there a way to throw a more useful error if a user loads a newer model than is supported by the current ML․NET version? Perhaps a warning,
"Model was trained on a newer version of ML․NET. Forward compatibility is not guaranteed. Please update your ML.NET version"
, anytime a newer model is detected.
I was using TLC/MAML to create my LR model and would then export it to a text file which would show the weights and the bias value. Our subsequent pipeline consumes this file. We are trying to switch over to ML.NET and are noticing that it doesn't have an option to export the same format. Can we please get this functionality?
System information
Issue
What did you do?
What happened?
What did you expect?
Source code / logs
Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.