Integration triton inference server with djl

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

MIT No Attribution

202 stars 31 forks source link

Closed madhurprash closed 1 month ago

madhurprash commented 1 month ago

This PR contains code refactoring for triton on AWS chips using VLLM and DJL. This is tested on triton (on vllm, djl) and a previous djl file.

To do: