Closed v-fuchs closed 7 years ago
Hello @v-fuchs
There are multiple reasons that could explain why you get a difference confidence result than DIGITS, there is a similar discussion https://github.com/NVIDIA/DIGITS/issues/1231. For instance the pre-processing (resize, image channels, crop/squash strategy) might be different.
However, if the prediction for the same image varies between two requests, there might be a problem indeed. I will try to take a look with GoogleNet, this code was mainly tested with AlexNet/CaffeNet. Thank you for your report.
Can you try removing --default-stream per-thread
from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.
Thanks a lot, Felix, for your fast help!
I will immediately try your suggestions tomorrow @work (central european time ;). I will also test AlexNet and CaffeNet models on our GRE solution and check the confidence stability for the same image.
best regards Valentin
Can you try removing --default-stream per-thread from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.
This fix solved the problem! But unfortunately the inference time dropped from about 25ms to 250ms for one image (GoogleNet).
Weird, it doesn't match my observation. With GoogleNet on a single GeForce 1080 with --default-stream per-thread
removed from both caffe and OpenCV:
boom -c 2 -n 10000 -m POST -d @images/2.jpg http://127.0.0.1:8000/api/classify
Summary:
Total: 66.2839 secs
Slowest: 0.0202 secs
Fastest: 0.0085 secs
Average: 0.0133 secs
Requests/sec: 150.8663
Total data: 3120000 bytes
Size/request: 312 bytes
Status code distribution:
[200] 10000 responses
Response time histogram:
0.009 [1] |
0.010 [7] |
0.011 [43] |
0.012 [1267] |∎∎∎∎∎∎∎∎∎∎∎∎
0.013 [3185] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.014 [4182] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.016 [1080] |∎∎∎∎∎∎∎∎∎∎
0.017 [216] |∎∎
0.018 [9] |
0.019 [3] |
0.020 [7] |
With the original code:
boom -c 2 -n 10000 -m POST -d @images/2.jpg http://127.0.0.1:8000/api/classify
Summary:
Total: 61.8989 secs
Slowest: 0.0207 secs
Fastest: 0.0073 secs
Average: 0.0124 secs
Requests/sec: 161.5538
Total data: 3119960 bytes
Size/request: 311 bytes
Status code distribution:
[200] 10000 responses
Response time histogram:
0.007 [1] |
0.009 [2] |
0.010 [0] |
0.011 [0] |
0.013 [8606] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.014 [1365] |∎∎∎∎∎∎
0.015 [24] |
0.017 [0] |
0.018 [1] |
0.019 [0] |
0.021 [1] |
Can you check it's not just a one-time initialization time?
Hi Felix,
thank you for your benchmark results. I will do the same tomorrow on different setups and post the results!
best regards Valentin
Hello @v-fuchs, were you able to investigate more?
Hi Felix,
unfortunately I wasn't able to test the predictions more in detail. With the fixes you suggested (remove --default-stream per-thread
from docker script) we now have 100% stable predictions with GoogleNet.
This fix solved the problem! But unfortunately the inference time dropped from about 25ms to 250ms for one image (GoogleNet).
This statement of me is wrong! The inference time (with the fixes) is now between 35ms - 100ms which is pretty good for our needs. If I can help you out with some more benchmarks let me know what exactly I should test.
Thanks a lot for your help!
Valentin
P.S.: Tested on a Zotac MAGNUS EN980 SPECIAL EDITION
Can you try removing --default-stream per-thread from this line and launch your tests again? It might impact performance negatively, but I know there is some corner cases of the default stream behavior of CUDA.
If you do this, don't forget to re-insert a &&
operator in the line right before it like I just did... Luckily if you do, just execute the Docker container build again and it picks up right where it left off.
Hi,
thank you again for providing the GRE solution which is very helpful for us. Unfortunately we still have a big problem.
We trained a caffemodel on a NVIDIA DevBox with 2 Titan X, nvcaffe 0.15, DIGITS 4.0, CUDA 8.0, cuDNN 5.1, GoogleNet for 20 classes. The trained model is working like a charm when classifying pictures with DIGITS itself and the predictions are correct AND ALWAYS have the same response (when repeating the inference on the same picture).
Unfortunately, when importing the caffemodel (snapshot.caffemodel, deploy.prototxt, labels.txt, mean.binaryproto) to the GRE solution (we just replaced your model with ours in the Dockerfile.inference_server and left everything else as it is) the predictions of the same image are always changing and aren't always correct.
We deployed our caffemodel to different devices (compiled caffe on iOS or Android, used caffe standalone, used several different versions, ...) and the predictions were always correct and had the same value for the same picture.
We really would like to use the GRE solution for our projects but we can't find the problem.
We really appreciate any kind of help!