Triton server orchestration for production deployment

The Triton server(s) could be organized in several different ways for a realistic production deployment.

A. One server per model

Requires some central map of IP:model name
Does this imply one model per GPU?

B. Single server for all models (and all GPUs)

Load-balancing already works well
Need to ensure serving multiple models can be done efficiently

C. Some hybrid of A and B

D. Other?

In addition, it's likely that at least each Tier1/Tier2 would eventually have their own GPU servers (to reduce latency). The IP addresses of each site's server(s) could be tracked in e.g. site-local-config.xml or another appropriate part of the production infrastructure.

Triton 2.X supports https/ssl, which could potentially be used for client-server authentication in production to maintain security.

attn: @violatingcp @holzman @mapsacosta

fastmachinelearning / SonicCMS

Triton server orchestration for production deployment #18