[Image Classification] Very long time to warm-up when doing the first prediction

dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.

https://dot.net/ml

MIT License

9.04k stars 1.89k forks source link

[Image Classification] Very long time to warm-up when doing the first prediction #4428

Open CESARDELATORRE opened 5 years ago

CESARDELATORRE commented 5 years ago

I'd like to know if we can do anything to improve the first prediction's needed time when using the new Image Classification model based on DNN (TensorFlow).

This behavior/times happen when using the default DNN architecture which is ResnetV250.

When using the CPU, the first prediction takes something in between 7 to 12 seconds depending on the model and environment. Then, upcoming predictions using the same PredictionEngine only need around 200 mlSecs if using CPU.

When using a GPU the difference is even larger. Around 15 secs for the first prediction, then a lot less for the next predictions (in this case, something in between 40 mlsecs and 100 mlSecs)

Basically, after the first prediction, it behaves good, with CPU and even better with GPU, but the first prediction needs a huge amount of time to probably initialize internally?

Could that initialization be improved or happen before calling .Predict()? I'd like to know if we can do anything to improve the behavior/perf of the first prediction like initializing in advanced when creating the prediction engine instead of when predicting the first time?

@codemzs - Thoughts?

codemzs commented 5 years ago

i could not repro this on my machine

CESARDELATORRE commented 5 years ago

This simple scoring sample reproduces it: https://github.com/dotnet/machinelearning-samples/tree/migration/1.4/samples/csharp/getting-started/DeepLearning_ImageClassification_Training/ImageClassification.Predict

The trained model is created in the project within the same solution which is a "by default settings" using the new API.

I see the same behavior for the first prediction when using the PredictionEnginePool in ASP.NET Core apps. But the sample above is simply a console app doing a prediction.

luisquintanilla commented 5 years ago

Mine's around 5 seconds. See output below.

Loading model from: C:\Dev\machinelearning-samples\samples\csharp\getting-started\DeepLearning_ImageClassification_Training\ImageClassification.Predict\bin\Debug\netcoreapp2.1\../../../assets\inputs\MLNETModel\imageClassifier.zip
2019-11-01 14:41:34.285528: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
First Prediction took: 5301mlSecs
Second Prediction took: 165mlSecs
Image Filename : [BlackRose.png], Predicted Label : [roses], Probability : [0.9992207]

Predicting several images...
Image Filename : [BlackRose.png], Predicted Label : [roses], Probability : [0.9992207]
Image Filename : [classic-daisy.jpg], Predicted Label : [daisy], Probability : [0.9997143]
Image Filename : [classic-tulip.jpg], Predicted Label : [tulips], Probability : [0.9992585]
Image Filename : [RareThreeSpiralledRose.png], Predicted Label : [roses], Probability : [1]
Press any key to end the app..

CESARDELATORRE commented 5 years ago

Interestingly, when using different architectures, scoring time for the first prediction is significantly different. (Probably the initialization time for each model is different):

ResnetV250 (using CPU): First Prediction took: 5 Secs Second Prediction took: 150 mlSecs

ResnetV2101 (using CPU): First Prediction took: 3 Secs Second Prediction took: 319 mlSecs

MobilenetV2 (using CPU): First Prediction took: 1.2 Secs Second Prediction took: 63 mlSecs

But in all cases the difference between the first prediction and the next predictions are significant.

Note that the sample is using a model loaded from a .ZIP file, starting from scratch, the predicting.

eerhardt commented 5 years ago

Mine's around 5 seconds.

Mine as well. 5 seconds on first prediction. ~130ms for subsequent.

codemzs commented 5 years ago

@ashbhandare is going to be investigating this.

ashbhandare commented 5 years ago

I have investigated the issue, and it seems like the time difference between the first and second prediction is in the Tensorflow session run call as suspected. This is known Tensorflow behavior, as tf does some initialization and graph optimization passes in the first session run.

In order to mask this time within our prediction engine creation, I have a working solution where a session run is executed on a fake tensor during initialization, thus making the subsequent actual .Predict calls take similar time. I am further investigating if there is a better way of initializing the Tensorflow session.

Oceania2018 commented 5 years ago

It's reasonable to assume that the slowness in the first run, the first pass is always long because the backend is allocating the memory and stuff like that.

You should ideally run the sessions in a long-running process.

CESARDELATORRE commented 5 years ago

Right, I understand that, of course. My point would be to move that "warming up" actions and memory allocation to the objects' creation/initialization instead of the first real prediction. Any workaround to do that.

ashbhandare commented 5 years ago

Right, I understand that, of course. My point would be to move that "warming up" actions and memory allocation to the objects' creation/initialization instead of the first real prediction. Any workaround to do that.

The PR referenced above(https://github.com/dotnet/machinelearning/pull/4456) attempted to do this, but it was decided that we did not want to put a hack in.