NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go
BSD 3-Clause "New" or "Revised" License
421 stars 94 forks source link

caffe-server - same images different results? #19

Closed mkh-github closed 6 years ago

mkh-github commented 6 years ago

I'm using GRE with my own googlenet model - finetuned for my own classes. When I run a set of images using CURL commands, I get correct results the first time, but then after that I get different results, and mostly incorrect labels. Any ideas on what is going on? Also, sometimes after stop and start caffe-server I don't get the correct result for the images. I can also run CURL command for a single image and get different results. Any idea how I can debug this?

This is what I did

  1. build docker image after updating Docker.caffe_server final CMD to point caffe-server to my deploy/caffemodel/labels

  2. start caffe-server in docker

  3. create bash script to run multiple curl commands

    $ more c.sh
    curl -XPOST --data-binary @images/mine/1.jpg http://127.0.0.1:8000/api/classify 
    curl -XPOST --data-binary @images/mine/2.jpg http://127.0.0.1:8000/api/classify 
    curl -XPOST --data-binary @images/mine/3.jpg http://127.0.0.1:8000/api/classify 
    curl -XPOST --data-binary @images/mine/4.jpg http://127.0.0.1:8000/api/classify 
    curl -XPOST --data-binary @images/mine/5.jpg http://127.0.0.1:8000/api/classify 
    curl -XPOST --data-binary @images/mine/6.jpg http://127.0.0.1:8000/api/classify 
  4. Run a set of images through. These are correct results -- the Top-1 classification is the correct classification.

$bash c.sh [{"confidence":1.0000,"label":"1 car#1"},{"confidence":0.0000,"label":"7 car#7"},{"confidence":0.0000,"label":"48 car#48 "},{"confidence":0.0000,"label":"31 car#31 "},{"confidence":0.0000,"label":"24 car#24 "}] [{"confidence":1.0000,"label":"1 car#1"},{"confidence":0.0000,"label":"7 car#7"},{"confidence":0.0000,"label":"31 car#31 "},{"confidence":0.0000,"label":"48 car#48 "},{"confidence":0.0000,"label":"72 car#72 "}] [{"confidence":0.9993,"label":"42 car#42 "},{"confidence":0.0004,"label":"72 car#72 "},{"confidence":0.0002,"label":"15 car#15 "},{"confidence":0.0001,"label":"48 car#48 "},{"confidence":0.0000,"label":"37 car#37 "}] [{"confidence":0.9983,"label":"42 car#42 "},{"confidence":0.0015,"label":"15 car#15 "},{"confidence":0.0002,"label":"72 car#72 "},{"confidence":0.0000,"label":"48 car#48 "},{"confidence":0.0000,"label":"37 car#37 "}] [{"confidence":0.9929,"label":"83 car#83 "},{"confidence":0.0049,"label":"18 car#18 "},{"confidence":0.0014,"label":"37 car#37 "},{"confidence":0.0002,"label":"4 car#4"},{"confidence":0.0002,"label":"31 car#31 "}] [{"confidence":0.9950,"label":"83 car#83 "},{"confidence":0.0025,"label":"37 car#37 "},{"confidence":0.0015,"label":"18 car#18 "},{"confidence":0.0003,"label":"4 car#4"},{"confidence":0.0001,"label":"31 car#31 "}]

  1. run them again - -this time incorrect results

$bash c.sh [{"confidence":1.0000,"label":"1 car#1"},{"confidence":0.0000,"label":"7 car#7"},{"confidence":0.0000,"label":"48 car#48 "},{"confidence":0.0000,"label":"31 car#31 "},{"confidence":0.0000,"label":"24 car#24 "}] [{"confidence":0.2805,"label":"32 car#32 "},{"confidence":0.2293,"label":"78 car#78 "},{"confidence":0.1423,"label":"19 car#19 "},{"confidence":0.1389,"label":"23 car#23 "},{"confidence":0.0670,"label":"83 car#83 "}] [{"confidence":0.9961,"label":"1 car#1"},{"confidence":0.0039,"label":"48 car#48 "},{"confidence":0.0000,"label":"24 car#24 "},{"confidence":0.0000,"label":"72 car#72 "},{"confidence":0.0000,"label":"42 car#42 "}] [{"confidence":0.6228,"label":"1 car#1"},{"confidence":0.2645,"label":"42 car#42 "},{"confidence":0.0975,"label":"48 car#48 "},{"confidence":0.0123,"label":"72 car#72 "},{"confidence":0.0022,"label":"15 car#15 "}] [{"confidence":0.6866,"label":"72 car#72 "},{"confidence":0.2250,"label":"38 car#38 "},{"confidence":0.0786,"label":"42 car#42 "},{"confidence":0.0045,"label":"37 car#37 "},{"confidence":0.0032,"label":"47 car#47 "}] [{"confidence":0.8894,"label":"72 car#72 "},{"confidence":0.0574,"label":"42 car#42 "},{"confidence":0.0279,"label":"38 car#38 "},{"confidence":0.0122,"label":"37 car#37 "},{"confidence":0.0088,"label":"47 car#47 "}]

  1. run them again -- still incorrect results, but also different results than the prior run.

$bash c.sh [{"confidence":0.8562,"label":"1 car#1"},{"confidence":0.0738,"label":"31 car#31 "},{"confidence":0.0165,"label":"7 car#7"},{"confidence":0.0157,"label":"5 car#5"},{"confidence":0.0143,"label":"83 car#83 "}] [{"confidence":0.7486,"label":"1 car#1"},{"confidence":0.1179,"label":"31 car#31 "},{"confidence":0.0478,"label":"83 car#83 "},{"confidence":0.0380,"label":"72 car#72 "},{"confidence":0.0156,"label":"5 car#5"}] [{"confidence":0.8816,"label":"1 car#1"},{"confidence":0.1139,"label":"48 car#48 "},{"confidence":0.0039,"label":"42 car#42 "},{"confidence":0.0005,"label":"72 car#72 "},{"confidence":0.0001,"label":"15 car#15 "}] [{"confidence":0.6118,"label":"42 car#42 "},{"confidence":0.1504,"label":"1 car#1"},{"confidence":0.1332,"label":"48 car#48 "},{"confidence":0.0662,"label":"72 car#72 "},{"confidence":0.0193,"label":"15 car#15 "}] [{"confidence":0.8353,"label":"72 car#72 "},{"confidence":0.0847,"label":"42 car#42 "},{"confidence":0.0743,"label":"38 car#38 "},{"confidence":0.0025,"label":"37 car#37 "},{"confidence":0.0015,"label":"1 car#1"}] [{"confidence":0.9453,"label":"72 car#72 "},{"confidence":0.0203,"label":"42 car#42 "},{"confidence":0.0198,"label":"38 car#38 "},{"confidence":0.0080,"label":"37 car#37 "},{"confidence":0.0034,"label":"47 car#47 "}]

flx42 commented 6 years ago

Hello @mkh-github,

Can you try removing --default-stream per-thread in the Dockerfile? https://github.com/NVIDIA/gpu-rest-engine/blob/master/Dockerfile.caffe_server#L40 https://github.com/NVIDIA/gpu-rest-engine/blob/master/Dockerfile.caffe_server#L52

Some users have reported the same issue and doing the fix above helped, but I've never been able to reproduce it. If it solves your problem, I will edit the Dockerfile.

mkh-github commented 6 years ago

@flx42 I'll try that.

mkh-github commented 6 years ago

@flx42 - your solution fixed my problem. Thanks for the quick response!

flx42 commented 6 years ago

Ok, I've submitted the patch upstream. Thanks!