fastmachinelearning / SonicCMS

Services for Optimized Network Inference on Coprocessors (for CMS)
8 stars 8 forks source link

Triton server orchestration for production deployment #18

Open kpedro88 opened 4 years ago

kpedro88 commented 4 years ago

The Triton server(s) could be organized in several different ways for a realistic production deployment.

A. One server per model

B. Single server for all models (and all GPUs)

C. Some hybrid of A and B

D. Other?

In addition, it's likely that at least each Tier1/Tier2 would eventually have their own GPU servers (to reduce latency). The IP addresses of each site's server(s) could be tracked in e.g. site-local-config.xml or another appropriate part of the production infrastructure.

Triton 2.X supports https/ssl, which could potentially be used for client-server authentication in production to maintain security.

attn: @violatingcp @holzman @mapsacosta