Open SidneyLann opened 1 year ago
DJL is a low level library. We have DJLServing as a model server which is designed as a general inference platform. And we do support running tritoncore inside DJLServing. Please take a look: https://docs.djl.ai/master/docs/serving/index.html
Description
What's the solutions of concurrency for AI model inference in DJL? Multithreads can access a model in the same time? Support Nvidia Triton?
Will this change the current api? How?
Who will benefit from this enhancement?
References