Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
202
stars
31
forks
source link
Integration triton inference server with djl #204
Closed
madhurprash closed 1 month ago
This PR contains code refactoring for triton on AWS chips using VLLM and DJL. This is tested on triton (on vllm, djl) and a previous djl file.
To do: