System prompt Override of Llama3 broken

Bardia323 commented 2 months ago

Using the basic Ollama API scheme over Llama3 is breaking the chat when calling embeddings gives erroneous output:

The template for Llama3-instruct scheme is as follows:

{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>

I suspect the system prompt override for using embeddings is not including the right headers and end of sequence token leading to. It would also be nice if I could modify the override myself to make sure my chat's character is not erased.

Thanks Brian!

Cheers, Bardia

brianpetro commented 2 months ago

Hey @Bardia323

Thanks for bringing this issue to my attention.

I'm not very familiar with how instruct models work on Ollama. Do they use the same chat-style API as chat models? Or are they different, like using instruct models via the OpenAI API?

I'm also unfamiliar with the scheme and how it's used in Ollama. Any documentation you can point me to would help me better understand this issue.

Below is a screenshot of a few logs in the dev console that you might find useful in figuring out how Smart Connections prepares the request to be sent to Ollama. These logs should appear each time you send a chat message.

Screenshot 2024-04-22 at 11 35 48 AM

Being able to override all of the system prompts is something I am working on. So far, I don't have an ETA, but it is high on my priority list.

Thanks for your help in solving this 🌴

Bardia323 commented 2 months ago

Hey Brian,

Thanks for getting back to me so quickly on this. Different instruct models have different templates such that they get wrapped in between some custom tokens to indicate whether they're user prompts, system prompts or model outputs. In the case of Llama3, these tokens can be seen in the codeblock i provided above.

In order to have support for custom models, it would be a good idea to manually specify and inject these tokens depending on the model such that it allows for flexibility. You can see an example of such a request with a custom template to the Ollama API below. I.e. "Raw mode"

Here the instructions are wrapped between the identifying tokens that specifies that it's an instruction. You can read the full documentation of the API here:

https://github.com/ollama/ollama/blob/main/docs%2Fapi.md

Let me know if this helps and if there is anything I missed in my reply.

Cheers, Bardia

brianpetro commented 2 months ago

@Bardia323, thanks a lot for the clarification!

While I'm still figuring out the specifics of using the raw property, I think I know why this isn't working with the Smart Chat.

The Smart Chat local model integration only supports the /api/chat endpoint and not the /api/generate endpoint. Afaik Llama 3 has so far only been released as an instruct model, and that might make it dependent on the /api/generate endpoint currently (though I'm not sure about this, maybe you can clarify).

Here are some ideas about moving forward:

Easy: wait for Llama 3 to be released as a chat model.

Intermediate: See if there is any way for Ollama to serve instruct models over the api/chat endpoint since Smart Chat already supports this format.

Advanced: Since Open Router seems to have successfully created an adapter to serve instruct endpoints in the chat format (evidenced by Llama 3 working with Smart Chat when using Open Router), a similar adapter may be contributed to the smart-chat-model module. With the adapter, I can add a setting to the custom local model settings that would allow using this adapter in Smart Chat with local models.

As far as the last "advanced" option goes, I still need to wrap my head around the specifics of how Ollama works with the api/generate endpoint and how the raw scheme comes into play before I can make progress on this on my own. But if you are familiar with JavaScript, the smart-chat-model.js file is the primary structure, and the adapters/ directory has multiple examples of existing adapters that provide the necessary overrides for methods found in smart-chat-model.js. I'd be happy to assist if you decide to go this route.

On a side note, I plan to experiment further with instruct models. I think there is a lot of unexplored territory with how these might be utilized. But I also wasn't anticipating that their integration would be in the form of the Smart Chat; rather, I had some other user experiences planned for their use.

I wish I had a way to get this working immediately! 🌴

brianpetro commented 2 months ago

@Bardia323 someone was able to get Llama 3 working with Ollama in the Smart Chat, and they shared their configuration here https://github.com/brianpetro/obsidian-smart-connections/issues/559#issuecomment-2070851452

Bardia323 commented 2 months ago

Hey Brian,

Thanks for sharing this! I'll give it a go tonight!

🙏 Bardia

brianpetro / obsidian-smart-connections

System prompt Override of Llama3 broken #560