Closed ddobric closed 3 years ago
Hi @ddobric Given that .NET code runs in a managed memory environment, the memory usage patterns are somewhat non-deterministic and subject to how the garbage collector behaves. In most cases this works out as expected, but occasionally when the managed object holds references to unmanaged memory this can be a problem. In this particular case, the image classification trainer relies on TF.NET which in turns holds on to unmanaged memory in the tensorflow core.
It is possible to bring some amount of determinism by trying to explicitly dispose off some of the objects that reference TF.NET objects. You can do that by disposing off the unused models in the list of results returned from mlContext.MulticlassClassification.CrossValidate
. ( That is call (cvResult.Model as IDisposable)?.Dispose()
for all cvResults with index > 1 - the models that you are not using.) And also remember to dispose off the top model after you are done using it.
Also, in your case since your model relies on TF.NET, you can also control the memory usage a bit by disposing off the loaded mlnetModel
as above.
You can find more examples of this in TensorflowTests
Hope that answers your questions.
Hi @harishsk,
thanks for your answer. I agree on TF.NET and TensorFlow chain, regarding memory usage. However, the training code posted above with cvResult
is not running in the web application. I showed this code to explain which example is related to my solution. The code that is running in the web application is implemented in the method LoadPool
.
That code is the code which only uses prediction and it should not consume GBs memory. Diagrams shown above have nothing with the training to do.
The web application does the following:
Hope this clarifies better the issue.
Hi @ddobric, Can you please share a small but complete repro that illustrates your problem?
Hi @harishsk,
following @ddobric issue, the ASP.NET Core application in this repository illustrates the problem. When starting the application, it will start a browser with the predefined URL which then trigger the following sequence:
The number of instances of model that is loaded into the memory is defined and can be changed in appsettings.json under EnginesConfig.PoolSize. Following is an example snippet of appsettings.json
{
...
"EnginesConfig": {
"PoolSize": 5,
"ModelName": "114_Repcon_KRE_zip"
}
}
Hi @QuangBui3101 and @ddobric,
Thank you for the repro case. I have been debugging it and evaluating the memory usage. The increase you see on the first call is not unexpected and not due to ML.NET.
If you turn on symbol server and source server in your debugging options, you can follow along the explanation below.
Firstly, ML.NET is designed from the ground up for lazy evaluation. That means, tasks such as memory evaluation and calculations are deferred until they are actually necessary. So when you load the model and create the prediction engine, not all the necessary memory is allocated right away (e.g. the Classifier
object in ImageClassificationTrainer.cs
.) Those objects are created only on the first call to predict. But those objects do not contribute to most of the memory usage in this case. The sudden increase of close to 500K that you see on the first call are almost all coming from the call to Classifier.Score and specifically within the two calls to ProcessImage
and _runner.AddInput
Each of those calls end up calling c_api.TF_SessionRun. Almost all the memory increase you see is coming from within the call to TF.NET and TF. You are seeing a memory increase of about 500KB per prediction engine almost all of that is coming from TF. And with multiple prediction engines instantiated simultaneously, it is reasonable to expect gigabytes of memory usage. It may be possible to optimize the memory usage either in the model or in TF, but that would be outside the scope of this repo.
I have also confirmed your observation that memory consumption remains stable after the peak.
Please let me know if you have any further questions or concerns.
Hi Harish,
thank you so much for your valuable feedback. I totally agree on lazy load behavior, which ich acceptable. You have also approved that TF is reposnsible for memory consumption and not directly ML.NET. I have expected this. However, at the end we are talking about 500MB and not 500kb. 🤔 That is very strange behavior and not really acceptable. This is the required RAM per user request.
Damir
Sorry, that was a typo from my end. The spikes in memory usage that Visual Studio shows are all in MB. (Not in KB as I wrote above)
This is what I see
CreatePredictionEngine
: 196MBCreatePredictionEngine
and before calling Predict
: 572MBPredict
:
_imageProcessor.ProcessImage
in Classifier.Score
in ImageClassificationTrainer.cs: 572MB
Before calling c_api.TF_SessionRun
in Runner.Run
: 572MB
After the call above: 668MB_imageProcessor.ProcessImage
and before calling _runner.AddInput
: 668MB
c_api.TF_SessionRun
in Runner.Run
: 668MBAs you can see from the above, the 500+MB memory increase on the first predict call is almost all coming from the two calls to TF_SessionRun
. That seems to be the memory required by Tensorflow to execute inferencing on those models.
@harishsk thanks for your feedback. Now, I assume we have the same understanding of the behaviour.
- That seems to be the memory required by Tensorflow to execute inferencing on those models.
- the 500+MB memory increase
I hope we also agree that 500MB per prediction engine is too much?
We are using this approach for web applications and thinking on mobile devices. The later one is on pending with dependency to ML.NET mobile support.
The high memory consumption is an extremely limiting factor. Whan can be done to optimize this?
Damir
When I add memory traces after each of the calls involved, this is what I see:
Memory before Model.Load: 0.0342 GB
Memory after Model.Load: 0.2572 GB
Memory before CreatePredictionEngine: 0.2572 GB
Memory after CreatePredictionEngine: 0.2698 GB
Memory before calling Predict : 0.2698 GB
Memory after calling Predict : 0.6847 GB
So roughly speaking
Memory for model load : 223MB
Memory for CreatePredictionEngine: 12.6MB
Memory for Predict: 414.9MB
The bulk of the memory is still consumed by the combination of loading the model and running prediction on it. Both those are functions of the size and functionality of the model.
I am afraid I am unable to advise you on how best to optimize your TensorFlowModel to lower the memory used.
I hope that answers your questions. Please feel free to reopen the issue if you have more questions.
To me, it is ok to close the issue if we cannot improve it. But, we have to conclude that the high memory consumption of TensorFlow makes hosting of models in the Web, only theoretically possible. Because ML.NET wraps up TensorFlow (in this specific case), the issue cannot be fixed inside of ML.NET. It has to be done in the TensorFlow.
How about tagging it as "Unresolved"?
We have a web application that hosts a trained model to enable users for prediction scenarios. The solution is based on the ImageClassificationModelTraining.Solution in the machinelearning-samples repo. The training was done by following code (just a snippet for a case that it is important):
To make this working, we have loaded a pool of Prediction Engine instances, which will be assigned to incoming requests. Following code shows how instances are created on startup.
This part works fine. As next, we have measured how much RAM the application will need when deployed to the AppService (or anything else). We figured out that the memory consumption of trained model is extremely high. Following diagram shows the behaviour of the prediction engine. First, we load a set of prediction engines (in this example 6 instances) by using the code shown above.
After loading some space in RAM is consumed. However on the first Predict invoke of the particular instance of the prediction engine, there is a peak of 1.5-2.0 GB. After the peak, the memory consumption gets stable again.
The issue with the peak is that, when it happens, it causes the AppService health feature sometimes to restart the service. Ok, it is not nice, but it can be fixed by using higher AppService offering. However, it would be good to know where does the peak comes from for the case that it can get higher than 2GB.
Another negative observation is the high memory consumption of the single instance of the prediction engine. Following diagram shows the consumption in dependence on the number of instances of the prediction engine.
The blue line shows the consumption after the load of the predication engine and the green one shows consumption of the prediction engine instances after the
Predict
method has been invoked on each of them.The dotted line is the memory consumption as calculated by the formel shown in the diagram. The issue with behaviour is that the consumption of the single prediction engine instance is approx. 600MB, which is too much. We could easily calculate here how much would cost the App Service with just 100 concurrent users. It is too much for this scenario.
We can understand and agree that training is a heavy scenario and might require a lot of memory and CPU resources. However trained models must be more lightweight.
System information