NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go
BSD 3-Clause "New" or "Revised" License
421 stars 94 forks source link

Dockerfile.inference_server for CentOS? #2

Closed v-fuchs closed 8 years ago

v-fuchs commented 8 years ago

Hi,

I would really like to use GRE in our projects but unfortunately we are stuck during the installation because we are using CentOS. Is there maybe a Dockerfile.inference_server for CentOS aswell? I tried to adapt the script for a centos environment but at the end the server couldn't start:

I would highly appreciate any kind of help. Thanks in advance! Valentin

flx42 commented 8 years ago

Since you are using Docker, why do you need to have CentOS inside the container too? You can have a CentOS host but run containers using ubuntu 14.04 or ubuntu 16.04.

v-fuchs commented 8 years ago

Hi Felix, thank you very much for your answer.

I hadn't know that much about docker so I read some documents after your post and I am now totally convinced of this solution.

Unfortunately I'm now on vacation for one week so I can't try your suggested solution on our production servers at work. I'll try it immediately after my return and give you feedback.

For now I have one remaining question: Is it somehow possible to add more than one caffemodel to one inference_server (container) or do I have to create another container for every caffemodel? If it is possible...how would be the _curl_command look like? How do I switch between different models?

Thank you again for your fast reply and your highly appreciated help.

best regards Valentin

flx42 commented 8 years ago

The best approach is probably to have one Docker container (i.e. one server) per model, it would keep the code simple. You will have multiple servers running on the same machine, and you just need to modify the port in your curl command to use a different model.

v-fuchs commented 8 years ago

Hi Felix,

thanks again for your help.

I have some more question to the _Dockerfile.inferenceserver:

1.: In the beginning of the script you define the following:

ENV CUDA_ARCH_BIN "30 35 50 52 60" ENV CUDA_ARCH_PTX "60"

Maybe I'm wrong but is "60" not an invalid CUDA compute capability? Refering to https://developer.nvidia.com/cuda-gpus shouldn't it be 61 instead of 60 for the new Pascal GPUs?

2.: Later in this script you mention that you are using an modified version of caffe from your github repository. Can I also use caffe from the NVIDIA github repository with the same benefits when running GRE by building it with your mentioned cmake paramaters? Or may you be so kind and explain me briefly what modifications you did with your caffe version? I'm a little bit confused because you clone with your bvlc_inference branch. Is it because you are building this container for bvlc caffenet? Should I clone with another branch of your repository when using my own dataset respectively caffemodell or is it always the same branch?

3.: Same question similar to 2. Do I have to use opencv 3.0.0 or can I use 2.4.13 instead without any losses?

Thanks a lot for your very helpful advices

best regards Valentin

flx42 commented 8 years ago
  1. Looks like a typo on our webpage, I got it fixed ;) thanks for the report! 6.0 is the new P100, but the code will work for 6.1
  2. The branch is called bvlc_inference because the branch is based on BVLC/caffe and not NVIDIA/caffe. You could try with NVIDIA/caffe but last time I tried there was no performance difference for inference with batch size of 1. NVIDIA/caffe really shines for multi-GPU training with large batch sizes and complex networks. I have a single patch on top of BVLC/caffe: https://github.com/flx42/caffe/commit/1a5187a259a5cb31fef0e091bfe4795b268b1238 This is useful because the go HTTP server creates many many threads, and we don't want each thread to create a caffe context on the GPUs, it will waste memory. This is a limitation of the Caffe design. I think you don't strongly need this patch, you can try without, but you will need to modify the code a little bit, and make sure everything still works as expected.
  3. OpenCV 3.0 is preferred because you can define a custom memory pool for GPU allocation, otherwise you will lose performance because of the many calls to cudaMalloc/cudaFree for preprocessing the images. But again, if you use Docker, you don't have to care about which version of OpenCV is used, each container can use its own version of OpenCV.
v-fuchs commented 8 years ago

Thank you very much for your detailed answer!

v-fuchs commented 8 years ago

I could now succefully build and run the inference-server.

I have one remaining problem. The classification using

$ curl -XPOST --data-binary @images/1.jpg http://127.0.0.1:8000/api/classify

works only executed with superuser rights. Running without these rights I get the following error:


Access Denied (authentication_failed)

Your credentials could not be authenticated: "General authentication failure due to bad user ID or authentication token.". You will not be permitted access until your credentials can be verified.
This is typically caused by an incorrect username and/or password, but could also be caused by network problems.

For assistance, contact your network support team.

Apparently some problems with the rights. How can I make the API accessible for everybody in the network or how to create user/pw with relevant rights for accessing the API?

flx42 commented 8 years ago

I don't know how you ended up with that error, it certainly isn't coming from my code. You need to check your network settings, and check that it's actually the inference server that is running on port 8000