How many H100-80GB devices Needed for Merging and Tuning a Llama 3-based MoE Model?

Leeroo-AI / mergoo

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

https://www.leeroo.com/

GNU Lesser General Public License v3.0

360 stars 19 forks source link

How many H100-80GB devices Needed for Merging and Tuning a Llama 3-based MoE Model? #6

Closed jacklanda closed 2 months ago

jacklanda commented 2 months ago

I love this amazing project!

Two remaining questions

How many H100/A100 80GB GPUs should I use to conduct merging and fine-tuning experiments on Mergoo 🤔?
Does Mergoo support building the MoE model based on Llama 3?

jacklanda commented 2 months ago

By the way, I would like to contribute to this project ☺️.

Could I file PRs for the enhancement directly for it? Thanks so much!

alirezamshi commented 2 months ago

Thanks for your interest.

For the compute, it depends on your MoE method, e.g. Mixture-of-Adapters or MoE + typical MoE. The latter requires GPUs with larger memory as feed-forward layers are multipled by the number of merged experts.

Yes, mergoo supports LLaM3 based experts. Please check the tutorial here.

For the contribution, feel free to PR directly :)