dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.01k stars 1.88k forks source link

Hosting a pre-trained ML Model in Azure Functions #569

Closed HowardvanRooijen closed 5 years ago

HowardvanRooijen commented 6 years ago

Hi,

ML.NET + Azure Functions seems like the perfect combination. There's only one problem; ML.NET is x64 and Functions is x86. There is an x64 version of Functions available https://github.com/Azure/azure-functions-core-tools/releases but there is no documentation on how to set it up locally or configure it in Azure.

I have many scenarios where we'd like to process incoming data / files using ML.NET pre-trained models (I've been working on a sample that combines Cognitive Services OCR to extract bank transactions from screengrabs, and then uses a ML.NET model for classifying the bank transactions into spend categories, but it all falls down when you try and host the model in Functions). when you combine them with Function Bindings to automatically run when new data is added.

Have you ever tried this hosting scenario? Can you reach out to the Functions team internally (I've raised the question, but have got no response)?

It seems like the perfect combination of technologies - but frustratingly it just doesn't work (yet).

loflet commented 6 years ago

Hello Howard,

I haven't tried good hands on with ML.NET but I have worked on Azure Machine Learning workbench. I have trained a model for classification problem and created a rest service which was hosted on Azure. I also used that service as a REST API to get the prediction from Web interface or phone app. This is the sample I tried running and hosting on Azure - https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/scenario-image-classification-using-cntk

I am not sure if the above will help you or not but just wanted to share this information with you.

Regards, Sanket Ghorpade

HowardvanRooijen commented 6 years ago

Hi thanks for that.

Yes, I'm very familiar with Machine Learning Studio (fantastic service, ahead of it's time, beloved by our customers who have outgrown Excel for data analysis), Machine Learning Workbench and Model Management. One of the problems we've always had productionizing these ML models is creating a hosting environment that is cost effective. There are many scenarios where it would ideal to streamline the model evaluation process inside an application or workflow. ML.NET has suddenly opened up an new world of possibilities by allowing us to write ML models in .NET and use standard compute infrastructure (without any complicated 3rd party frameworks) to run them. Functions and especially Durable Functions allow us to create more complicated workflows (that also have the nice side effect of being able to fan-out) that can incorporate ML models but also at a serverless price-point - paying per invocation rather than having to pay to stand up a VM / Container / Machine Learning Studio instance. I think we're not far away from being able to run ML.NET in Functions - and I think that's a very, very exciting opportunity.

Regards,

Howard

loflet commented 6 years ago

Totally agree with you, a proper framework to combine different algorithms at one place is awesome. Exciting journey ahead :)

eerhardt commented 6 years ago

@HowardvanRooijen,

I've spent some time investigating how to make this work (I even spent some time with the Azure Functions team). We've narrowed down the issues, and I am successfully able to run our CoreFX GitHub Labeler in an Azure Function.

  1. You need to get an Azure Function that runs x64. See https://github.com/Azure/Azure-Functions/issues/651#issuecomment-409001093.
  2. We have some dependency injection issues, which can be worked around by adding code like the following to the beginning of the function:
            if (typeof(Microsoft.ML.Runtime.Data.LoadTransform) == null ||
                typeof(Microsoft.ML.Runtime.Learners.LinearClassificationTrainer) == null ||
                typeof(Microsoft.ML.Runtime.Internal.CpuMath.SseUtils) == null)
            {
                log.Info("Assemblies are not loaded perfectly");
            }

See #559, which is tracking this issue.

  1. We discovered some issues between the way ML.NET loads assemblies and the way Azure Functions loads assemblies. Basically, ML.NET needs to stop using Assembly.LoadFrom, and instead load assemblies in a way that works with custom AssemblyLoadContexts (ex. like what Azure Functions is doing).

Leaving this issue open to track fixing (3) above - Fixing Assembly.LoadFrom in ML.NET.

HowardvanRooijen commented 6 years ago

@eerhardt

That is absolutely amazing work!

I can confirm that my model now runs!

2018-07-31T22:03:58  Welcome, you are now connected to log-streaming service.
2018-07-31T22:04:14.994 [Information] Executing 'TransactionCategorizationFunction' (Reason='This function was programmatically called via the host APIs.', Id=8d3d7f37-0e83-4abd-8f76-1dad4c333638)
2018-07-31T22:04:16.932 [Information] C# Queue trigger function processed: Endjin.FreeAgent.Functions.ExpensesOcr.Transaction
2018-07-31T22:04:16.932 [Information] Download ML Model from Blob Storage
2018-07-31T22:04:16.933 [Information] Load ML Model
2018-07-31T22:04:26.263 [Information] Transaction Description: NON-STERLING TRANSACTION FEE
2018-07-31T22:04:27.057 [Information] Predicted Category: Bank/Finance Charges
2018-07-31T22:04:27.068 [Information] Executed 'TransactionCategorizationFunction' (Succeeded, Id=8d3d7f37-0e83-4abd-8f76-1dad4c333638)

Do you think this Assembly Loading issue is just limited to Functions? We've written a Dependency Injection framework and we tend to find that there are different assembly loading issues on each of the Azure Services (App Service, Cloud Services, Service Fabric, Functions) as they all have their own nuances.

Thanks very much for spending the time solving this issue - I think the combination of ML.NET + Functions is going to be amazing.

I'll write a blog post demonstrating all of this.

Thanks again - you've made me very happy!

Howard

eerhardt commented 6 years ago

Do you think this Assembly Loading issue is just limited to Functions?

No, I believe this issue will come up whenever an app is using a custom AssemblyLoadContext, and ML.NET is loaded in the non-default context.

HowardvanRooijen commented 6 years ago

BTW we're just about to test ML.NET inside Service Fabric - I'll let you know how that goes.

FernandoNunes commented 6 years ago

Any info on how ML.NET runs inside Service Fabric ? My thought is that it all should go well since Service Fabric is x64, right ?

HowardvanRooijen commented 6 years ago

@eerhardt has this been improved at all in the 0.5 release?

eerhardt commented 6 years ago

@HowardvanRooijen - I've got good news and bad news. And I always start with the bad news.

The bad news is that no, unfortunately this hasn't been changed at all in the 0.5 release, you will still need the above workarounds if you are using 0.5.

The good news is that I am actively working on fixing the underlying architectural issues with the dependency injection/component catalog for 0.6. See #208. My work will fix the assembly loading issues, and you will no longer be required to add the typeof workarounds listed above to get this scenario to work.

HowardvanRooijen commented 6 years ago

Thanks for the update @eerhardt

Can't wait - at the moment I can't get my end-to-end Durable Functions / Cognitive Services / ML.NET expenses OCR / classification / automation sample working - because I think ML.NET is causing assembly loading errors.

I see the following getting logged:

Duplicate loading of the assembly 'AzureFunctions.Extensions.CognitiveServices, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'
Duplicate loading of the assembly 'DurableTask.AzureStorage, Version=1.3.1.0, Culture=neutral, PublicKeyToken=d53979610a6e89dd'
Duplicate loading of the assembly 'DurableTask.Core, Version=2.0.8.0, Culture=neutral, PublicKeyToken=d53979610a6e89dd'
Duplicate loading of the assembly 'Endjin.FreeAgent.Domain, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'
Duplicate loading of the assembly 'Endjin.FreeAgent.Expenses.MachineLearning, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'
Duplicate loading of the assembly 'Endjin.FreeAgent.Expenses.Workflow, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'
Duplicate loading of the assembly 'Microsoft.Azure.KeyVault, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'
Duplicate loading of the assembly 'Microsoft.Azure.WebJobs.Extensions.DurableTask, Version=1.0.0.0, Culture=neutral, PublicKeyToken=014045d636e89289'
Duplicate loading of the assembly 'Microsoft.Azure.WebJobs.Extensions.Storage, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'
Duplicate loading of the assembly 'Microsoft.ML.Api, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Duplicate loading of the assembly 'Microsoft.ML.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Duplicate loading of the assembly 'Microsoft.ML.Data, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Duplicate loading of the assembly 'Microsoft.ML, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'
Duplicate loading of the assembly 'Polly, Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc'

and then get a runtime exception inside the Durable Function Activity:

System.Private.CoreLib: Exception while executing function: UploadExpense. Endjin.FreeAgent.Expenses.Workflow: Method not found: 'System.Threading.Tasks.Task`1<Endjin.FreeAgent.Domain.Expense> Endjin.FreeAgent.Client.FreeAgentClient.CreateNewExpenseAsync(Endjin.FreeAgent.Domain.Expense)'.
[19/09/2018 21:33:47] 3d6ac3bfcbdf4fdeac2e243e2b350377: Function 'UploadExpense (Activity)' failed with an error. Reason: System.MissingMethodException: Method not found: 'System.Threading.Tasks.Task`1<Endjin.FreeAgent.Domain.Expense> Endjin.FreeAgent.Client.FreeAgentClient.CreateNewExpenseAsync(Endjin.FreeAgent.Domain.Expense)'.
eerhardt commented 5 years ago

This is now working as expected using the latest nightly build: 0.6.0-preview-26929-2. I no longer need the workarounds listed above. ML.NET is now loading assemblies in a manner that is supported by Azure Functions.

Fixed by #970

HowardvanRooijen commented 5 years ago

Hi,

I couldn't find 0.6.0-preview-26929-2 in the daily builds feed https://dotnet.myget.org/F/dotnet-core/api/v3/index.json - is that the right location?

I took a copy of the latest build from that feed 0.7.0-preview-27001-4 and tried it inside my function by got the following exception

Executed 'PredictCategory' (Failed, Id=0a5fee89-d875-4b46-8ccf-5d9b947dde7c)
System.Private.CoreLib: Exception while executing function: PredictCategory. System.Private.CoreLib: Exception has been thrown by the target of an invocation. Microsoft.ML.Data: One of the identified items was in an invalid format.
6df01db0987a4f3d80c37d3f4fc6c458: Function 'PredictCategory (Activity)' failed with an error. Reason: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.FormatException: One of the identified items was in an invalid format.
   at Microsoft.ML.Runtime.Data.ConcatTransform.ColumnInfo..ctor(ModelLoadContext ctx)
   at Microsoft.ML.Runtime.Data.ConcatTransform..ctor(IHostEnvironment env, ModelLoadContext ctx)
   at Microsoft.ML.Runtime.Data.ConcatTransform.Create(IHostEnvironment env, ModelLoadContext ctx, IDataView input)
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.ML.Runtime.ComponentCatalog.LoadableClassInfo.CreateInstanceCore(Object[] ctorArgs)
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes](IHostEnvironment env, Type signatureType, TRes& result, String name, String options, Object[] extra)
   at Microsoft.ML.Runtime.ComponentCatalog.TryCreateInstance[TRes,TSig](IHostEnvironment env, TRes& result, String name, String options, Object[] extra)
   at Microsoft.ML.Runtime.Model.ModelLoadContext.TryLoadModelCore[TRes,TSig](IHostEnvironment env, TRes& result, Object[] extra)
   at Microsoft.ML.Runtime.Model.ModelLoadContext.TryLoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra)
   at Microsoft.ML.Runtime.Model.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, Entry ent, String dir, Object[] extra)
   at Microsoft.ML.Runtime.Model.ModelLoadContext.LoadModelOrNull[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, String dir, Object[] extra)
   at Microsoft.ML.Runtime.Model.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, String name, Object[] extra)
   at Microsoft.ML.Runtime.Data.CompositeDataLoader.LoadSelectedTransforms(ModelLoadContext ctx, IDataView srcView, IHostEnvironment env, Func`2 isTransformTagAccepted)
   at Microsoft.ML.Runtime.Model.ModelFileUtils.LoadTransforms(IHostEnvironment env, IDataView data, RepositoryReader rep)
   at Microsoft.ML.Runtime.Model.ModelFileUtils.LoadTransforms(IHostEnvironment env, IDataView data, Stream modelStream)
   at Microsoft.ML.Runtime.Api.DataViewConstructionUtils.LoadPipeWithPredictor(IHostEnvironment env, Stream modelStream, IDataView view)
   at Microsoft.ML.Runtime.Api.BatchPredictionEngine`2..ctor(IHostEnvironment env, Stream modelStream, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.Runtime.Api.ComponentCreation.CreateBatchPredictionEngine[TSrc,TDst](IHostEnvironment env, Stream modelStream, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
   at Microsoft.ML.Legacy.PredictionModel.ReadAsync[TInput,TOutput](Stream stream)
   at Endjin.FreeAgent.Expenses.Workflow.ExpenseProcessing.ExpenseProcessingActivities.PredictCategory(Transaction transaction, ILogger logger) in C:\_Projects\Tools\Endjin.FreeAgent\Solutions\Endjin.FreeAgent.Expenses.Workflow\ExpenseProcessing\ExpenseProcessingActivities.cs:line 47
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2.InvokeAsync(Object instance, Object[] arguments) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:line 63
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeAsync(IFunctionInvoker invoker, ParameterHelper parameterHelper, CancellationTokenSource timeoutTokenSource, CancellationTokenSource functionCancellationTokenSource, Boolean throwOnTimeout, TimeSpan timerInterval, IFunctionInstance instance) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 561
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstance instance, ParameterHelper parameterHelper, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 508
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstance instance, ParameterHelper parameterHelper, IFunctionOutputDefinition outputDefinition, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 444
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstance instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 249. IsReplay: False. State: Failed. HubName: DurableFunctionsHub. AppName: . SlotName: . ExtensionVersion: 1.6.2. SequenceNumber: 7.
eerhardt commented 5 years ago

I couldn't find 0.6.0-preview-26929-2 in the daily builds feed https://dotnet.myget.org/F/dotnet-core/api/v3/index.json - is that the right location?

That version is in dotnet.myget.org here: https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML/0.6.0-preview-26929-2

I took a copy of the latest build from that feed 0.7.0-preview-27001-4 and tried it inside my function by got the following exception

I've seen this error before if I tried loading an older model with the newer code. Can you try re-building the model with the same version to see if it repos?

Then please log a new bug for this scenario.

HowardvanRooijen commented 5 years ago

Rebuilding the model works!!! (A more helpful exception message might help)

IT'S ALIVE!

Seriously, thanks for all the hard work on this. I've just tested it end-to-end and it works as expected.

Now I just need to update the model I've built to use the new API rather than the "legacy" API!

Thanks again,

Howard

veikkoeeva commented 5 years ago

@HowardvanRooijen This is likely a long strech, but do dump a link here in case you write a blog post or anything. Would really like to see something about this. :)

eerhardt commented 5 years ago

Rebuilding the model works!!! (A more helpful exception message might help)

Thanks for the confirmation, @HowardvanRooijen. Can you do me a favor and log a new issue for loading your model built with the old version using the new version? And show the steps you took to build the model using the old version. I believe there is something wrong with the ConcatTransform and it isn't respecting backwards compatibility like it should. Thanks.