basetenlabs / truss-examples

Examples of models deployable with Truss
https://trussml.com
MIT License
103 stars 24 forks source link

reduce Llama 3 70B from 4 to 2 H100s #279

Closed philipkiely-baseten closed 2 months ago

philipkiely-baseten commented 2 months ago

Llama 3 70B only needs 2, not 4, 80 GB GPUs to run inference. I've changed the config and tested the updated config in production.