reduce Llama 3 70B from 4 to 2 H100s

basetenlabs / truss-examples

Examples of models deployable with Truss

https://trussml.com

MIT License

103 stars 24 forks source link

Closed philipkiely-baseten closed 2 months ago

philipkiely-baseten commented 2 months ago

Llama 3 70B only needs 2, not 4, 80 GB GPUs to run inference. I've changed the config and tested the updated config in production.