Open rarzumanyan opened 3 years ago
Collecting more user data to put requirements together.
@philipp-schmidt @clinthidinger @ruilongzhang @dbasbabasi
Please provide some information regarding the way you'd like to use VPF in conjunction with Triton. As far as I understand, there are several possible options:
Please correct me if I'm missing something.
Sorry for the late reply. The only option giving a performance benefit would be option 3. All the other options require a copy back to host memory. If the image is in host memory then passing it to Triton is really easy and literally what it was built for.
Again emphasizing that I'm currently not seeing any bottlenecks with copying the decoded frames back to host memory, even when the GPU is under heavy load. It looks like gpu memory bandwidth is plenty enough for most usecases. This would be more of a "nice to have" and "wow what an efficient architecture" thing do to.
Also I believe it might already be implemented like that in the Deepstream Triton plugin. Not sure though.
Collecting more user data to put requirements together.
@philipp-schmidt @clinthidinger @ruilongzhang @dbasbabasi
Please provide some information regarding the way you'd like to use VPF in conjunction with Triton. As far as I understand, there are several possible options:
- Link VPF against Triton server C libraries: #1. Basically, Triton will be used as back-end and it shall orchestrate the inference on the local machine.
- Implement gRPC / HTTP client which supports shared system memory: #2.
- Implement gRPC / HTTP client which supports shared CUDA memory: #3.
- Develop a Python sample which uses Triton client Python API in conjunction with VPF Python API to get some video frames and send them to Triton server for inference.
Please correct me if I'm missing something.
I would be interested in 3. where VPF handles decoding, scaling, etc on gpu and also handles asynchronous batch creation from multiple video streams.
Are there any updates on this?
As triton-inference server has python-backend, is it possible to add it to the triton-inference server just by like pip install something?
Description is taken from #205 by @philipp-schmidt
Triton is basically a server running your ML networks and offering http or grpc endpoints for them. Clients can send a request and invoke the networks and receive results without having to use frameworks or bother with implementation or GPU acceleration, etc. It's by far the best tool to scale ML workload imo.
To start a request to the server there are basically three options to pass the data (e.g. an image to analyze via object detection):
Obviously the performance increases with the order in the list as the amount of memory copy operations goes down to basically zero with sharing GPU memory.
It would obviously be very useful if VPF could simply provide the necessary pointers to the data in the GPU memory to Triton Inference Server. This would mean that VPF needs to have a very good set of processing layers (like the already existing scaling and color convert operation), because most networks expect a very well defined input and conversion needs to happen on GPU. And there might be things like batching (compute multiple images at once) which might make things more complicated.