Open rperdon opened 6 years ago
You likely need to take the latest version of TensorRT instead, yes. You will get better performance. You will probably need to make small edits to the Dockefile for that.
You might consider:
The new and just released TensorRT Inference Server for a super simple mechanism for serving TensorRT engine files. https://ngc.nvidia.com/registry/nvidia-inferenceserver https://github.com/NVIDIA/dl-inference-server
Or if you are looking for a more custom microservice, https://github.com/NVIDIA/yais
I'm working on TensorRT Inference Server Model Store builder. It's a work-in-progress, but you might find it useful. Note, it's on a feature branch: https://github.com/NVIDIA/yais/tree/feature-pybind11/examples/12_ConfigGenerator
I have already worked with the most recent release of TensorRT and Inference server and found that the linux implementation works, but the inference client does not work in windows in its current form. My workflow is an application that interfaces uses JSON data.
gpu-rest-engine-master$ nvidia-docker run --name=server --net=host --rm inference_server 2018/09/18 02:31:30 Initializing TensorRT classifiers
I am just trying to get the TensorRT server started and on two different servers with fresh downloads of the GRE, they are just getting stuck at the Initializing TensorRT state. I have been able to get the GRE caffe server code up and running. I have tried clearing the cache for Docker to no success.
I'm running this on DGX-1 (16gb version) volta. I'm wondering if tensorRT 2 may not work correctly with this GPU.