Open CESARDELATORRE opened 5 years ago
i could not repro this on my machine
This simple scoring sample reproduces it: https://github.com/dotnet/machinelearning-samples/tree/migration/1.4/samples/csharp/getting-started/DeepLearning_ImageClassification_Training/ImageClassification.Predict
The trained model is created in the project within the same solution which is a "by default settings" using the new API.
I see the same behavior for the first prediction when using the PredictionEnginePool in ASP.NET Core apps. But the sample above is simply a console app doing a prediction.
Mine's around 5 seconds. See output below.
Loading model from: C:\Dev\machinelearning-samples\samples\csharp\getting-started\DeepLearning_ImageClassification_Training\ImageClassification.Predict\bin\Debug\netcoreapp2.1\../../../assets\inputs\MLNETModel\imageClassifier.zip
2019-11-01 14:41:34.285528: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
First Prediction took: 5301mlSecs
Second Prediction took: 165mlSecs
Image Filename : [BlackRose.png], Predicted Label : [roses], Probability : [0.9992207]
Predicting several images...
Image Filename : [BlackRose.png], Predicted Label : [roses], Probability : [0.9992207]
Image Filename : [classic-daisy.jpg], Predicted Label : [daisy], Probability : [0.9997143]
Image Filename : [classic-tulip.jpg], Predicted Label : [tulips], Probability : [0.9992585]
Image Filename : [RareThreeSpiralledRose.png], Predicted Label : [roses], Probability : [1]
Press any key to end the app..
Interestingly, when using different architectures, scoring time for the first prediction is significantly different. (Probably the initialization time for each model is different):
ResnetV250 (using CPU): First Prediction took: 5 Secs Second Prediction took: 150 mlSecs
ResnetV2101 (using CPU): First Prediction took: 3 Secs Second Prediction took: 319 mlSecs
MobilenetV2 (using CPU): First Prediction took: 1.2 Secs Second Prediction took: 63 mlSecs
But in all cases the difference between the first prediction and the next predictions are significant.
Note that the sample is using a model loaded from a .ZIP file, starting from scratch, the predicting.
Mine's around 5 seconds.
Mine as well. 5 seconds on first prediction. ~130ms for subsequent.
@ashbhandare is going to be investigating this.
I have investigated the issue, and it seems like the time difference between the first and second prediction is in the Tensorflow session run call as suspected. This is known Tensorflow behavior, as tf does some initialization and graph optimization passes in the first session run.
In order to mask this time within our prediction engine creation, I have a working solution where a session run is executed on a fake tensor during initialization, thus making the subsequent actual .Predict calls take similar time. I am further investigating if there is a better way of initializing the Tensorflow session.
It's reasonable to assume that the slowness in the first run, the first pass is always long because the backend is allocating the memory and stuff like that.
You should ideally run the sessions in a long-running process.
Right, I understand that, of course. My point would be to move that "warming up" actions and memory allocation to the objects' creation/initialization instead of the first real prediction. Any workaround to do that.
Right, I understand that, of course. My point would be to move that "warming up" actions and memory allocation to the objects' creation/initialization instead of the first real prediction. Any workaround to do that.
The PR referenced above(https://github.com/dotnet/machinelearning/pull/4456) attempted to do this, but it was decided that we did not want to put a hack in.
I'd like to know if we can do anything to improve the first prediction's needed time when using the new Image Classification model based on DNN (TensorFlow).
This behavior/times happen when using the default DNN architecture which is ResnetV250.
When using the CPU, the first prediction takes something in between 7 to 12 seconds depending on the model and environment. Then, upcoming predictions using the same PredictionEngine only need around 200 mlSecs if using CPU.
When using a GPU the difference is even larger. Around 15 secs for the first prediction, then a lot less for the next predictions (in this case, something in between 40 mlsecs and 100 mlSecs)
Basically, after the first prediction, it behaves good, with CPU and even better with GPU, but the first prediction needs a huge amount of time to probably initialize internally?
Could that initialization be improved or happen before calling .Predict()? I'd like to know if we can do anything to improve the behavior/perf of the first prediction like initializing in advanced when creating the prediction engine instead of when predicting the first time?
@codemzs - Thoughts?