Gadersd / llama2-burn

Llama2 LLM ported to Rust burn
MIT License
272 stars 17 forks source link

How to run this with mistral? #9

Open sigma-andex opened 10 months ago

sigma-andex commented 10 months ago

Hi,

I tried running this with mistralai_Mistral-7B-v0.1 which in my understanding uses the llama architecture. However testing the model only gives me gibberish. I used the following params.json:

{
    "dim": 4096,
    "multiple_of": 256,
    "n_heads": 32,
    "n_layers": 32,
    "norm_eps": 1e-05,
    "vocab_size": -1
}

Any idea what I'm missing?

mikebirdgeneau commented 7 months ago

There are a few differences between the model architectures; this article does a decent job of going through the pytorch implementations and key differences:

https://github.com/neobundy/Deep-Dive-Into-AI-With-MLX-PyTorch/blob/master/deep-dives/001-mistral-7b/README.md

I was looking at whether this was possible as well, but haven't had time to run through the details in full yet.