basetenlabs / truss

The simplest way to serve AI/ML models in production
https://truss.baseten.co
MIT License
857 stars 61 forks source link

add num builder gpus config #1002

Closed joostinyi closed 3 weeks ago

joostinyi commented 3 weeks ago

:rocket: What

linear[bot] commented 3 weeks ago

BT-11273 Engine builder: FP8 quantization needs more GPU memory than model, support this