dotnet / infer

Infer.NET is a framework for running Bayesian inference in graphical models
https://dotnet.github.io/infer/
MIT License
1.56k stars 228 forks source link

Serialization in .Net 5.0 #371

Closed YnnamTenob closed 2 years ago

YnnamTenob commented 3 years ago

I have recently met with an issue where saving and serializating a model in .Net 5.0 (i.e. in .Net Notebooks or in Azure functions) is met with a Binary Serialization error. As far as I understand this error occurs due to the Security Concerns arround binary serialization. The recommended method of dealing with the Binary serailizer incompatibility is the implementation of the ISafeSerialization. After reviewing the code base in this repo that would require modifications all the way down to the Automaton to allow for safe serialization of models particularly for models that implement Incremental Learning hence need to be saved and serialized in the runtime environment. Is there a solution for this problem other than the obvious one I've mentioned above?

tminka commented 3 years ago

Try one of the other forms of serialization described at How to save distributions to disk.

YnnamTenob commented 3 years ago

Thank you very much @tminka

YnnamTenob commented 2 years ago

Hi @tminka I tried serializing using the Json Format as specified in the link you gave above. It serializes an object however said object cannot be deserialized.

The type that BayesPointMachineClassifier.CreateBinaryClassifier() creates is a:

CompoundBinaryStandardDataFormatBayesPointMachineClassifier<IList,Int32,.IList,Boolean>

However CompoundBinaryStandardDataFormatBayesPointMachineClassifier is internal to the library and cannot be deserialized, and as far as I know the interface that it implements also cannot be deserialized:

IBayesPointMachineClassifier<TInstanceSource, TInstance, TLabelSource, TStandardLabel, IDictionary<TStandardLabel, double>, TTrainingSettings, TPredictionSettings>

here is the json for the document that gets serialized. can you provide further guidance

{"$id":"1","Capabilities":{"$type":"Microsoft.ML.Probabilistic.Learners.BayesPointMachineClassifierCapabilities, Microsoft.ML.Probabilistic.Learners.Classifier","IsPrecompiled":true,"SupportsMissingData":false,"SupportsSparseData":true,"SupportsStreamedData":false,"SupportsBatchedTraining":true,"SupportsDistributedTraining":false,"SupportsIncrementalTraining":true,"SupportsModelEvidenceComputation":true,"SupportsCustomPredictionLossFunction":true},"Settings":{"$type":"Microsoft.ML.Probabilistic.Learners.BinaryBayesPointMachineClassifierSettings1[[System.Boolean, System.Private.CoreLib]], Microsoft.ML.Probabilistic.Learners.Classifier","Training":{"ComputeModelEvidence":true,"IterationCount":30,"BatchCount":1},"Prediction":{}},"LogModelEvidence":3.97958442052054E+58,"WeightPosteriorDistributions":[[{"$id":"2","MeanTimesPrecision":626025.5042802754,"Precision":82322.75332875556},{"$id":"3","MeanTimesPrecision":30362.83511852963,"Precision":51130.19043756452},{"$id":"4","MeanTimesPrecision":12859.264686436629,"Precision":29585.013866421086},{"$id":"5","MeanTimesPrecision":32502.261634344814,"Precision":54865.49022045436},{"$id":"6","MeanTimesPrecision":-59793.9994307964,"Precision":51650.08331815232},{"$id":"7","MeanTimesPrecision":-970304.615442721,"Precision":82088.09398443818},{"$id":"8","MeanTimesPrecision":-13736.474440944254,"Precision":1070.108999079218},{"$id":"9","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"10","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"11","MeanTimesPrecision":-1702.6201262384427,"Precision":34903.860626653935},{"$id":"12","MeanTimesPrecision":8658.962632282977,"Precision":37373.63634200442},{"$id":"13","MeanTimesPrecision":-2648.267940949969,"Precision":33252.09404733848},{"$id":"14","MeanTimesPrecision":-1657.4090277754572,"Precision":41051.78020934264},{"$id":"15","MeanTimesPrecision":-8279.877894003368,"Precision":49464.953898461245},{"$id":"16","MeanTimesPrecision":11817.810324595444,"Precision":38585.50544514171},{"$id":"17","MeanTimesPrecision":-998.9685662433118,"Precision":42659.79079697079},{"$id":"18","MeanTimesPrecision":567.713458343229,"Precision":45060.405379882264},{"$id":"19","MeanTimesPrecision":-4866.908171070479,"Precision":33867.34682744422},{"$id":"20","MeanTimesPrecision":-2212.440868850474,"Precision":38654.43235522826},{"$id":"21","MeanTimesPrecision":2033.5012976530375,"Precision":41472.83605039607},{"$id":"22","MeanTimesPrecision":7276.96940891261,"Precision":39560.217427066855},{"$id":"23","MeanTimesPrecision":-474.6755519876042,"Precision":40996.033982061395},{"$id":"24","MeanTimesPrecision":-496.2688900091968,"Precision":41273.181220511135},{"$id":"25","MeanTimesPrecision":-2619.249074880769,"Precision":37094.117068140185},{"$id":"26","MeanTimesPrecision":10753.914960150307,"Precision":36489.15439825997},{"$id":"27","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"28","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"29","MeanTimesPrecision":-3151.8968041883113,"Precision":45325.11342413276},{"$id":"30","MeanTimesPrecision":-1607.6089215679199,"Precision":41735.45510874064},{"$id":"31","MeanTimesPrecision":7864.585933719726,"Precision":38974.62441225201},{"$id":"32","MeanTimesPrecision":-7983.75322009148,"Precision":42422.11953163474},{"$id":"33","MeanTimesPrecision":-390.77278939691513,"Precision":41665.61608825006},{"$id":"34","MeanTimesPrecision":3631.6023830937147,"Precision":40453.76803177451},{"$id":"35","MeanTimesPrecision":-76.39421668042813,"Precision":40111.4519278756},{"$id":"36","MeanTimesPrecision":2424.0668058261167,"Precision":40473.36147362701},{"$id":"37","MeanTimesPrecision":-4011.2590679123623,"Precision":38564.93567352199},{"$id":"38","MeanTimesPrecision":4212.714325147318,"Precision":42481.44758418159},{"$id":"39","MeanTimesPrecision":-3209.426216690391,"Precision":43270.95146322725},{"$id":"40","MeanTimesPrecision":-6828.664619226283,"Precision":41466.37962146438},{"$id":"41","MeanTimesPrecision":-6025.0126948113775,"Precision":41146.139203613944},{"$id":"42","MeanTimesPrecision":-630.2583456838215,"Precision":39252.518898957285},{"$id":"43","MeanTimesPrecision":-3217.7944236199496,"Precision":45003.1837823578},{"$id":"44","MeanTimesPrecision":-2795.307850582982,"Precision":38556.98398877265},{"$id":"45","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"46","MeanTimesPrecision":0.0,"Precision":2.1356716578201329E-32},{"$id":"47","MeanTimesPrecision":4885.973157513242,"Precision":48170.67997592764},{"$id":"48","MeanTimesPrecision":146.57635452268178,"Precision":43660.12391360518},{"$id":"49","MeanTimesPrecision":-620.6493272039843,"Precision":39707.65848153476},{"$id":"50","MeanTimesPrecision":-1020.2163019830665,"Precision":38516.18540140339},{"$id":"51","MeanTimesPrecision":1637.335214187853,"Precision":47097.1747684993},{"$id":"52","MeanTimesPrecision":-156.0525226517272,"Precision":43713.92351849498},{"$id":"53","MeanTimesPrecision":47.991145678774004,"Precision":44963.71262570452},{"$id":"54","MeanTimesPrecision":502.57591237162745,"Precision":43158.944587158614},{"$id":"55","MeanTimesPrecision":1077.817943476354,"Precision":37157.60540980723},{"$id":"56","MeanTimesPrecision":2606.6733501357394,"Precision":43503.10519271262},{"$id":"57","MeanTimesPrecision":-6082.686575452986,"Precision":43900.58114542141},{"$id":"58","MeanTimesPrecision":413.52291401773243,"Precision":45070.26501218828},{"$id":"59","MeanTimesPrecision":2531.300641271793,"Precision":42414.85495110738},{"$id":"60","MeanTimesPrecision":-190.69089873335562,"Precision":41727.10464514889},{"$id":"61","MeanTimesPrecision":-2948.5967275258836,"Precision":42778.8320003398},{"$id":"62","MeanTimesPrecision":71.35541670046288,"Precision":36509.54611718066},{"$id":"63","MeanTimesPrecision":8392763.756970031,"Precision":2605902.5064716735},{"$id":"64","MeanTimesPrecision":-1135155.6708183656,"Precision":83157.7942655128}]]}`

tminka commented 2 years ago

How were you able to serialize that type using Json? It isn't designed to do that.

YnnamTenob commented 2 years ago

@tminka sorry for the delay I was out on PTO. here is what I did?

`using Net.Json; using Newtonsoft.Json; using Newtonsoft.Json.Serialization; using System.Collections.Concurrent; using Microsoft.ML.Probabilistic.Collections; using Microsoft.ML.Probabilistic.Learners.BayesPointMachineClassifierInternal;

var modelFile = "IdentifierSearchScoringModel.json";

class CollectionAsObjectResolver : DefaultContractResolver
{
private static readonly HashSet SerializeAsObjectTypes = new HashSet {
typeof(Vector), typeof(Matrix), typeof(IArray<>), typeof(ISparseList<>)
};

private static readonly ConcurrentDictionary<Type, JsonContract> ResolvedContracts = new ConcurrentDictionary<Type, JsonContract>();
public override JsonContract ResolveContract(Type type) => ResolvedContracts.GetOrAdd(type, this.ResolveContractInternal);
private JsonContract ResolveContractInternal(Type type) => IsExcludedType(type)? this.CreateObjectContract(type): this.CreateContract(type);
private static bool IsExcludedType(Type type) { if (type == null) return false; if (SerializeAsObjectTypes.Contains(type)) return true; if (type.IsGenericType && SerializeAsObjectTypes.Contains(type.GetGenericTypeDefinition())) return true; return IsExcludedType(type.BaseType) || type.GetInterfaces().Any(IsExcludedType);
} }

var serializerSettings = new JsonSerializerSettings {
TypeNameHandling = TypeNameHandling.Auto,
ContractResolver = new CollectionAsObjectResolver(),
PreserveReferencesHandling = PreserveReferencesHandling.Objects
};
var serializer = JsonSerializer.Create(serializerSettings);

var mapping = new ClassifierMapping();
var classifier = BayesPointMachineClassifier.CreateBinaryClassifier(mapping); classifier.Settings.Training.ComputeModelEvidence = true; // Train the Bayes Point Machine classifier classifier.Train(trainResult.features, trainResult.labels);

// write to disk
using (FileStream stream = new FileStream($"{simulationPath}/{dataVersion}/Engineered/{modelFile}", FileMode.Create))
{ var streamWriter = new StreamWriter(stream); var jsonWriter = new JsonTextWriter(streamWriter);
serializer.Serialize(jsonWriter, classifier);
jsonWriter.Flush();
}`

YnnamTenob commented 2 years ago

So if it isn't designed to do that how can I serialize a BPM model in .Net 5.0?

YnnamTenob commented 2 years ago

Hi @tminka,

Do you have any advice here? Not that it is any of your problem but I am coming up on a hard deadline and if I am not able to deserialize the model in .Net 5.0 I will have to scrap this approach.

Best, MB

tminka commented 2 years ago

The code that you sent doesn't work for me. As far as I can tell, Json.NET cannot serialize these classes at all. That is why I was confused how you got that output. The Learner classes all implement custom binary serialization. They would have to be changed to support any other form of serialization.

YnnamTenob commented 2 years ago

Thanks. I rrun that code in a .Net notebook on Azure Machine Learning. It serializes all of the child classes but does not serilize the top level class. One final question since I have the MeanTimePRecision and Precision can I use the generated algorithm to do inference.

tminka commented 2 years ago

Yes, if you are just making predictions then you can use the generated algorithm directly.

tminka commented 2 years ago

I have created PR #373 which adds the ability to serialize BayesPointMachineClassifiers as text.

YnnamTenob commented 2 years ago

@tminka Thank You. This is great.

YnnamTenob commented 2 years ago

HiI @tminka,

When will PR #373 make it to a released Nuget Package?

tminka commented 2 years ago

I will update the Nuget package this week.