What's the solutions of concurrency for AI model inference in DJL?

deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java

https://djl.ai

Apache License 2.0

4.16k stars 660 forks source link

What's the solutions of concurrency for AI model inference in DJL? #2836

Open SidneyLann opened 1 year ago

SidneyLann commented 1 year ago

Description

What's the solutions of concurrency for AI model inference in DJL? Multithreads can access a model in the same time? Support Nvidia Triton?

Will this change the current api? How?

Who will benefit from this enhancement?

References

list reference and related literature
list known implementations

frankfliu commented 1 year ago

DJL is a low level library. We have DJLServing as a model server which is designed as a general inference platform. And we do support running tritoncore inside DJLServing. Please take a look: https://docs.djl.ai/master/docs/serving/index.html