Closed jacklanda closed 2 months ago
By the way, I would like to contribute to this project ☺️.
Could I file PRs for the enhancement directly for it? Thanks so much!
Thanks for your interest.
For the compute, it depends on your MoE method, e.g. Mixture-of-Adapters or MoE + typical MoE. The latter requires GPUs with larger memory as feed-forward layers are multipled by the number of merged experts.
Yes, mergoo supports LLaM3 based experts. Please check the tutorial here.
For the contribution, feel free to PR directly :)
I love this amazing project!
Two remaining questions