Adding Mixtral-8x22b - Githubissues

rdyro commented 4 weeks ago

adding mixtral-8x22b config (to pyconfig as well)
improving the llama and mistral conversion script - in-place weight writing to reduce total RAM usage - better progress tracking

rdyro commented 4 weeks ago

End-to-end tests on ml-auto-solutions are a PR here

peregilk commented 1 week ago

@rdyro I encountered OOM (Out of Memory) errors when loading a Llama 8B model on a v4-8 after this commit. It appears to be related to llama_or_mistral_chkpt.py. While I haven’t pinpointed the exact cause, reverting to the version of MaxText/llama_or_mistral_ckpt.py from commit aef1bb0b60c89b6c9876e89ce0b0c35b759235d7 resolves the issue.

I am able to reproduce the error by using the not-yet-merged script from @A9isha: llama_or_mistral_orbax_to_huggingface.py, where failure occurs at line 97. It’s likely that similar loading processes might trigger the same error.

AI-Hypercomputer / maxtext

Adding Mixtral-8x22b #845