Open sankexin opened 1 week ago
well done! Another way to get the original llama3 without changing the code:
Modify values similar to these names in "allamo/train_configs/train_1B.json"
dropout: 0,
dim: int = 4096
n_layers: int = 32
n_heads: int = 32
n_kv_heads: Optional[int] = None
rope_theta: float = 500000
max_batch_size: int = 32
max_seq_len: int = 2048
This is a great project, Open source training from scratch, simple and easy to use, especially suitable for ordinary people.
The currently sota algorithm models are highly similar to llama3. I hope everyone can train llama3 from scratch and mayby can help many interesting new algorithms person to promote social progress, propose new algorithms based on this none. Therefore, I have accorded to your project format to preliminarily implement llama3 and hope to help merge it into your project.
Replace the code in allamo/model/modl.py with the following code, then it can works:
result like this: