AmineDiro / cria

OpenAI compatible API for serving LLAMA-2 model
MIT License
212 stars 12 forks source link

LLama2 Chat Prompting incorrect? #18

Open 9876691 opened 10 months ago

9876691 commented 10 months ago

From this guide https://replicate.com/blog/how-to-prompt-llama

A prompt with history would look like

<s>[INST] <<SYS>>
You are are a helpful... bla bla.. assistant
<</SYS>>

Hi there! [/INST] Hello! How can I help you today? </s><s>[INST] What is a neutron star? [/INST] A neutron star is a ... </s><s> [INST] Okay cool, thank you! [/INST]

It may even be that the newlines can be removed.

So I think this prompt technique should replace the one in https://github.com/AmineDiro/cria/blob/main/src/routes/chat.rs#L16

AmineDiro commented 10 months ago

🤔 Interesting ! I actually think that you are right but I tested the chat/completions on some example and didn't saw any issues. Did you experience issues when testing ? btw, I am testing with llama-2-7b-chat.ggmlv3.q4_0.bin

9876691 commented 10 months ago

I'll try llama-2-7b-chat.ggmlv3.q4_0.bin and let you know what results I get. I'll also test the streaming.

Thanks

9876691 commented 10 months ago

@AmineDiro Could you run the docker build action for me? as I use the cria image as my base image and then I add in the model from hugging face.