Use chat models properly (prompt tags already fixed)

pieroit commented 9 months ago

At the moment we insert both system prompt (aka prompt_prefix) and conversation history in the prompt, without respecting model-specific prompt tags and treating every model as a completion model.

Let's try to design and implement a solid way to both leverage prompt tags and chat models, as suggested by @AlessandroSpallina. As an hypothesis, tags could be described in factory classes and used when cat._llm or the agent is used.

Notes:

I don't know how langchain deals with this (we should research it because maybe it solves it already)
HuggingFace chat templates can also be of inspiration as their solution is quite elegant
we can tackle the tags and the completion vs chat issues in two different PRs as it may get complicated

pieroit commented 9 months ago

@AlessandroSpallina please comment so I can assign you. Thanks :)

AlessandroSpallina commented 9 months ago

i’m here!

pieroit commented 8 months ago

Looks to me Langchain is already doing this, we probably can rely on it by passing chat history and system prompt as HumanMessage, AIMessage and SystemMessage objects from within cat.llm.

The API for cat.llm I suggest is:

def llm(self, prompt, chat=False, stream=False):
  # here we retrieve `chat_history` from working memory and convert it to langchain objects 
  pass

Not sure about the SystemMessage though?

valentimarco commented 8 months ago

I wanna help with this but:

There are already langchain's chat models that do some work, but the ollama is implemention is bad: They hard coded the Template crafting section with the llama2 template... (We use the llm classes that in case of ollama, call the llm using RestAPI. We are safe!)

Langchain implements messege type system (Right now there are System, Human, AI messege, with a Chatmessage to handle custom types) and methods to craft the prompt:


#Zephyr llm
template="<|system|>\nYou are a helpful assistant that translates {input_language} to {output_language}</s>\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="<|user|>\n{text}</s>\n"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
final_part_prompt = ChatPromptTemplate.from_template("<|assistant|>\n")
final_prompt = chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt,final_part_prompt])

chain = LLMChain(llm=llm, prompt=final_prompt)

out = chain.run(input_language="English", output_language="French", text="My family are going to visit me next week.")

3. Separate the cat prompt in "layers":
 Right now there isn't a System Prompt or a User prompt or even Agent prompt that we can dinamically attach on the final prompt, infact we have only Prefix and suffix prompt. If we define this spec, we get even more customization of the prompt, bc there is a hook for each type of prompt!
4. We need to define a spec to be able to parse this model templates: Hugging face is trying to impose the [ChatML format](https://github.com/openai/openai-python/blob/main/chatml.md), but many open llm like llama or zephyr have different model templates. My possible solution are: Simple "parsing by replacing strings" or use [Jinja template strings](https://jinja.palletsprojects.com/en/3.1.x/) like hugging face!
```py
class PromptTemplateTags:
    """Class that create Prompt from llm Template. Must be the exact same as the one provide to the llm model."""
    templateTags: str
    SystemTag: str
    UserTag: str
    def __init__(self, templateTags: str, SystemTag: str, UserTag: str):
        self.templateTags = templateTags
        self.SystemTag = SystemTag 
        self.UserTag = UserTag

    def create_prompt(self, system_message: str = "", user_message: str = "") -> PromptTemplate:
        prompt = self.templateTags.replace(self.SystemTag, system_message).replace(self.UserTag, user_message)
        return prompt

 prompt_model ="""<|system|>
{{ .System }}
</s>
<|user|>
{{ .Prompt }}
</s>
<|assistant|>"""

prompt = PromptTemplateTags(prompt_model, "{{ .System }}", "{{ .Prompt }}").create_prompt()

print(prompt)

I gather all of this infos with very little time but i think we can define a good Design Base!

valentimarco commented 8 months ago

To answer the 3. point, i design this diagram by splitting the prompt in 5 Hookable Message: System, LongTerm, ToolUsage, ShortTerm and ChatMessage. The idea behind LongTerm and ShortTerm is a Message that can by change in the fly by applying filter or mappers. (right now with before_agent_starts hook ) Same for ToolUsage but i know there is already a filter for allowed ones, so i am now sure about this one

Why do we need to split the prompt and be hookable if we have already the prefix e suffix hooks? The answer is simple:

Modify only the neccessary part of the prompt
Create a distinct separation from the System and Conversation prompt (with we need for the template!)
Define more atomic hooks for the cat!

(The Prompt Merge block is only for the schematic purpose!)

Obsidian_aEFvL5Z70R

pieroit commented 8 months ago

@valentimarco thanks for the diagram looks reasonable, also the PromptTemplateTags.

To be totally honest I am scared about all this fragmentation we have to deal with. Here a few consideration:

Even if langchain does not do it properly, Ollama itself will probably handle soon model-specific tags and have completion vs chat endpoints. Is it worth it to do all of this in the Cat?
If an open standard or a de-facto standard stabilizes (like the ChatML you mention), we would have to switch back again. Think how many things changed in 2/3 months... I expect a standard on this soon (and models trained on the standard, and runners with adeguate endpoints)
Priority for us is passing langchain the chat history for chat models instead of serializing convo manually

Can we focus on the last point? I mean inside here we can pass chat history from working memory directly to langchain ChatGPT and ChatOllama, as in here as you showed above. I know it's not peferct, but is the right direction without the risk of overengineering

Thanks a lot for dedicating the time

valentimarco commented 8 months ago

Maybe we can resolve with a temp plugin with PromptTemplateTags class, so people can use in an efficient way local LLMs... Also:

Ollama handles specific model-specific tags but only if you use with ollama run <model:version> (from my testing)
We can do a issue to langchain to inform discrepancy when using models different from llama2 (but they will probably risponde with the 2 point you describe)

I agree with you, in few months this changes maybe be revert but i don't see any possible solution for a good customability rather than those explained early

valentimarco commented 6 months ago

We saw that most of the runner:

use the OpenAi RESTAPI schema
Handles Prompt tags for each model

Now we need to use chat models properly by:

Create a list of message that represent the chat history (maybe using chat prompts of langchain)
Support only chat models.

pieroit commented 4 months ago

Also Ollama now supports the OpenAI pseudo standard https://github.com/ollama/ollama/blob/main/docs/openai.md

valentimarco commented 4 months ago

Yep, we need to wait a little more and we can use only one class for most of the runners!

pieroit commented 2 months ago

Work in progress in PR #783

pieroit commented 1 month ago

Merged into develop

cheshire-cat-ai / core

Use chat models properly (prompt tags already fixed) #480