Open salrowili opened 1 month ago
Any update on this?
Indeed https://github.com/google/maxtext/pull/581 was adding support for this. Out of curiosity what is your use case for this?
Hi @gobbleturk , https://github.com/google/maxtext/pull/581 does not work with Gemma because Gemma 2 has local and global attention. I think each of q k and v attention layer has a local layer followed by global one. My case is that I did a continual pre-training of the Gemma 2 2B model on a mono language pre-training dataset and I want to use HF SFT trainer to do supervised fine-tuning tasks.
Thank you though for taking care of this.
I have looked around for a script that could convert MaxText Gemma and Gemma 2 checkpoints to Hugging Face format but i have not find anything related. This may related to https://github.com/google/maxtext/pull/581