Open 9876691 opened 10 months ago
🤔 Interesting ! I actually think that you are right but I tested the chat/completions
on some example and didn't saw any issues. Did you experience issues when testing ?
btw, I am testing with llama-2-7b-chat.ggmlv3.q4_0.bin
I'll try llama-2-7b-chat.ggmlv3.q4_0.bin
and let you know what results I get. I'll also test the streaming.
Thanks
@AmineDiro Could you run the docker build action for me? as I use the cria image as my base image and then I add in the model from hugging face.
From this guide https://replicate.com/blog/how-to-prompt-llama
A prompt with history would look like
It may even be that the newlines can be removed.
So I think this prompt technique should replace the one in https://github.com/AmineDiro/cria/blob/main/src/routes/chat.rs#L16