AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.47k stars 275 forks source link

converting Gemma maxtext compatible checkpoint to Hugging Face format #829

Open salrowili opened 1 month ago

salrowili commented 1 month ago

I have looked around for a script that could convert MaxText Gemma and Gemma 2 checkpoints to Hugging Face format but i have not find anything related. This may related to https://github.com/google/maxtext/pull/581

salrowili commented 1 month ago

Any update on this?

gobbleturk commented 1 week ago

Indeed https://github.com/google/maxtext/pull/581 was adding support for this. Out of curiosity what is your use case for this?

salrowili commented 1 week ago

Hi @gobbleturk , https://github.com/google/maxtext/pull/581 does not work with Gemma because Gemma 2 has local and global attention. I think each of q k and v attention layer has a local layer followed by global one. My case is that I did a continual pre-training of the Gemma 2 2B model on a mono language pre-training dataset and I want to use HF SFT trainer to do supervised fine-tuning tasks.

Thank you though for taking care of this.