GPU memory usage - Githubissues

We recommend A100 or H100s.

For inference you can use a single GPU.

For finetuning it's more complicated. It depends on the sequence length and how long you want to train. For 8k+ sequence, you'll likely need 8+ gpus, and need model sharding to split the model across gpus to get it to fit in memory. There's a somewhat steep learning curve for these large models, and so we focused instead on making lots of web APIs and wrappers available for people. At some point we hope to have support for finetuning through Together infrastructure.

evo-design / evo

GPU memory usage #35