basetenlabs / truss

The simplest way to serve AI/ML models in production
https://truss.baseten.co
MIT License
857 stars 61 forks source link

Fix batch flow for briton #1022

Closed pankajroark closed 1 week ago

pankajroark commented 2 weeks ago

It just needed to be wired up. Implementation corresponds exactly to that of the triton version.

I've also reverted the gemm_plugin default value chain for now. We can bring it back when we're closer to supporting the latest trtllm version.