AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!
Apache License 2.0
1.47k stars 275 forks source link

Llama3.1 (8B,70B,405B) 🦙 #838

Open khatwanimohit opened 1 month ago

khatwanimohit commented 1 month ago

Tested: http://shortn/_TVtieLHb4u http://shortn/_iIa7Kkdcj7

peregilk commented 1 month ago

Wouldnt also Rope Theta scaling need to be implemented for Llama 3.1 to work correctly? Like done in HF: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/48d6d0fc4e02fb1269b36940650a1b7233035cbb/config.json#L21.

Or am I missing something here?

peregilk commented 4 weeks ago

@khatwanimohit I am testing the script for converting the Meta-checkpoints. Everything looks fine. However, for some reason the file scanned_chkpt/0/items/checkpoint is not written. This seems just to be a file for the state. The model weights seems to be stored in the bucket.

UPDATE: This seems to be just a status file, and since this is at checkpoint 0, it does not seem to matter. I can manually copy the file from Llama3, to fix this.

kocchop commented 2 weeks ago

Hi, what's the eta of the PR? Wanted to test the models on MaxText