Closed pieroit closed 1 month ago
@AlessandroSpallina please comment so I can assign you. Thanks :)
i’m here!
Looks to me Langchain is already doing this, we probably can rely on it by passing chat history and system prompt as HumanMessage
, AIMessage
and SystemMessage
objects from within cat.llm
.
The API for cat.llm
I suggest is:
def llm(self, prompt, chat=False, stream=False):
# here we retrieve `chat_history` from working memory and convert it to langchain objects
pass
Not sure about the SystemMessage
though?
I wanna help with this but:
#Zephyr llm
template="<|system|>\nYou are a helpful assistant that translates {input_language} to {output_language}</s>\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="<|user|>\n{text}</s>\n"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
final_part_prompt = ChatPromptTemplate.from_template("<|assistant|>\n")
final_prompt = chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt,final_part_prompt])
chain = LLMChain(llm=llm, prompt=final_prompt)
out = chain.run(input_language="English", output_language="French", text="My family are going to visit me next week.")
3. Separate the cat prompt in "layers":
Right now there isn't a System Prompt or a User prompt or even Agent prompt that we can dinamically attach on the final prompt, infact we have only Prefix and suffix prompt. If we define this spec, we get even more customization of the prompt, bc there is a hook for each type of prompt!
4. We need to define a spec to be able to parse this model templates: Hugging face is trying to impose the [ChatML format](https://github.com/openai/openai-python/blob/main/chatml.md), but many open llm like llama or zephyr have different model templates. My possible solution are: Simple "parsing by replacing strings" or use [Jinja template strings](https://jinja.palletsprojects.com/en/3.1.x/) like hugging face!
```py
class PromptTemplateTags:
"""Class that create Prompt from llm Template. Must be the exact same as the one provide to the llm model."""
templateTags: str
SystemTag: str
UserTag: str
def __init__(self, templateTags: str, SystemTag: str, UserTag: str):
self.templateTags = templateTags
self.SystemTag = SystemTag
self.UserTag = UserTag
def create_prompt(self, system_message: str = "", user_message: str = "") -> PromptTemplate:
prompt = self.templateTags.replace(self.SystemTag, system_message).replace(self.UserTag, user_message)
return prompt
prompt_model ="""<|system|>
{{ .System }}
</s>
<|user|>
{{ .Prompt }}
</s>
<|assistant|>"""
prompt = PromptTemplateTags(prompt_model, "{{ .System }}", "{{ .Prompt }}").create_prompt()
print(prompt)
I gather all of this infos with very little time but i think we can define a good Design Base!
To answer the 3. point, i design this diagram by splitting the prompt in 5 Hookable Message: System, LongTerm, ToolUsage, ShortTerm and ChatMessage. The idea behind LongTerm and ShortTerm is a Message that can by change in the fly by applying filter or mappers. (right now with before_agent_starts hook ) Same for ToolUsage but i know there is already a filter for allowed ones, so i am now sure about this one
Why do we need to split the prompt and be hookable if we have already the prefix e suffix hooks? The answer is simple:
(The Prompt Merge block is only for the schematic purpose!)
@valentimarco thanks for the diagram looks reasonable, also the PromptTemplateTags
.
To be totally honest I am scared about all this fragmentation we have to deal with. Here a few consideration:
Can we focus on the last point? I mean inside here we can pass chat history from working memory directly to langchain ChatGPT and ChatOllama, as in here as you showed above. I know it's not peferct, but is the right direction without the risk of overengineering
Thanks a lot for dedicating the time
Maybe we can resolve with a temp plugin with PromptTemplateTags
class, so people can use in an efficient way local LLMs...
Also:
ollama run <model:version>
(from my testing) I agree with you, in few months this changes maybe be revert but i don't see any possible solution for a good customability rather than those explained early
We saw that most of the runner:
Now we need to use chat models properly by:
Also Ollama now supports the OpenAI pseudo standard https://github.com/ollama/ollama/blob/main/docs/openai.md
Yep, we need to wait a little more and we can use only one class for most of the runners!
Work in progress in PR #783
Merged into develop
At the moment we insert both system prompt (aka
prompt_prefix
) and conversation history in the prompt, without respecting model-specific prompt tags and treating every model as a completion model.Let's try to design and implement a solid way to both leverage prompt tags and chat models, as suggested by @AlessandroSpallina. As an hypothesis, tags could be described in factory classes and used when
cat._llm
or the agent is used.Notes: