Open sigma-andex opened 10 months ago
There are a few differences between the model architectures; this article does a decent job of going through the pytorch implementations and key differences:
I was looking at whether this was possible as well, but haven't had time to run through the details in full yet.
Hi,
I tried running this with mistralai_Mistral-7B-v0.1 which in my understanding uses the llama architecture. However testing the model only gives me gibberish. I used the following params.json:
Any idea what I'm missing?