NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go
BSD 3-Clause "New" or "Revised" License
421 stars 94 forks source link

Confidence varies for each API call (for the same picture) #3

Closed v-fuchs closed 7 years ago

v-fuchs commented 7 years ago

Hi,

thank you again for providing the GRE solution which is very helpful for us. Unfortunately we still have a big problem.

We trained a caffemodel on a NVIDIA DevBox with 2 Titan X, nvcaffe 0.15, DIGITS 4.0, CUDA 8.0, cuDNN 5.1, GoogleNet for 20 classes. The trained model is working like a charm when classifying pictures with DIGITS itself and the predictions are correct AND ALWAYS have the same response (when repeating the inference on the same picture).

Unfortunately, when importing the caffemodel (snapshot.caffemodel, deploy.prototxt, labels.txt, mean.binaryproto) to the GRE solution (we just replaced your model with ours in the Dockerfile.inference_server and left everything else as it is) the predictions of the same image are always changing and aren't always correct.

We deployed our caffemodel to different devices (compiled caffe on iOS or Android, used caffe standalone, used several different versions, ...) and the predictions were always correct and had the same value for the same picture.

We really would like to use the GRE solution for our projects but we can't find the problem.

We really appreciate any kind of help!

flx42 commented 7 years ago

Hello @v-fuchs

There are multiple reasons that could explain why you get a difference confidence result than DIGITS, there is a similar discussion https://github.com/NVIDIA/DIGITS/issues/1231. For instance the pre-processing (resize, image channels, crop/squash strategy) might be different.

However, if the prediction for the same image varies between two requests, there might be a problem indeed. I will try to take a look with GoogleNet, this code was mainly tested with AlexNet/CaffeNet. Thank you for your report.

flx42 commented 7 years ago

Can you try removing --default-stream per-thread from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.

v-fuchs commented 7 years ago

Thanks a lot, Felix, for your fast help!

I will immediately try your suggestions tomorrow @work (central european time ;). I will also test AlexNet and CaffeNet models on our GRE solution and check the confidence stability for the same image.

best regards Valentin

v-fuchs commented 7 years ago

Can you try removing --default-stream per-thread from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.

This fix solved the problem! But unfortunately the inference time dropped from about 25ms to 250ms for one image (GoogleNet).

flx42 commented 7 years ago

Weird, it doesn't match my observation. With GoogleNet on a single GeForce 1080 with --default-stream per-thread removed from both caffe and OpenCV:

boom -c 2 -n 10000 -m POST -d @images/2.jpg http://127.0.0.1:8000/api/classify
Summary:
  Total:        66.2839 secs
  Slowest:      0.0202 secs
  Fastest:      0.0085 secs
  Average:      0.0133 secs
  Requests/sec: 150.8663
  Total data:   3120000 bytes
  Size/request: 312 bytes

Status code distribution:
  [200] 10000 responses

Response time histogram:
  0.009 [1]     |
  0.010 [7]     |
  0.011 [43]    |
  0.012 [1267]  |∎∎∎∎∎∎∎∎∎∎∎∎
  0.013 [3185]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.014 [4182]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.016 [1080]  |∎∎∎∎∎∎∎∎∎∎
  0.017 [216]   |∎∎
  0.018 [9]     |
  0.019 [3]     |
  0.020 [7]     |

With the original code:

boom -c 2 -n 10000 -m POST -d @images/2.jpg http://127.0.0.1:8000/api/classify
Summary:
  Total:        61.8989 secs
  Slowest:      0.0207 secs
  Fastest:      0.0073 secs
  Average:      0.0124 secs
  Requests/sec: 161.5538
  Total data:   3119960 bytes
  Size/request: 311 bytes

Status code distribution:
  [200] 10000 responses

Response time histogram:
  0.007 [1]     |
  0.009 [2]     |
  0.010 [0]     |
  0.011 [0]     |
  0.013 [8606]  |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.014 [1365]  |∎∎∎∎∎∎
  0.015 [24]    |
  0.017 [0]     |
  0.018 [1]     |
  0.019 [0]     |
  0.021 [1]     |

Can you check it's not just a one-time initialization time?

v-fuchs commented 7 years ago

Hi Felix,

thank you for your benchmark results. I will do the same tomorrow on different setups and post the results!

best regards Valentin

flx42 commented 7 years ago

Hello @v-fuchs, were you able to investigate more?

v-fuchs commented 7 years ago

Hi Felix,

unfortunately I wasn't able to test the predictions more in detail. With the fixes you suggested (remove --default-stream per-thread from docker script) we now have 100% stable predictions with GoogleNet.

This fix solved the problem! But unfortunately the inference time dropped from about 25ms to 250ms for one image (GoogleNet).

This statement of me is wrong! The inference time (with the fixes) is now between 35ms - 100ms which is pretty good for our needs. If I can help you out with some more benchmarks let me know what exactly I should test.

Thanks a lot for your help!

Valentin

P.S.: Tested on a Zotac MAGNUS EN980 SPECIAL EDITION

kraigrs commented 7 years ago

Can you try removing --default-stream per-thread from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.

If you do this, don't forget to re-insert a && operator in the line right before it like I just did... Luckily if you do, just execute the Docker container build again and it picks up right where it left off.