Why does the system sometimes enter multiple similar LLMchains in conversation?

MeowLake commented 1 year ago

Issue:

Sometimes the system enters two LLMchains that execute the similar actions in conversation using 'generate' function. Is this expected or not?

Log for a single conversation:

--------------------------  user ask:  -------------------------
Who are you?

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mask for bot's identity

bot respond about identity
  "I am an AI assistant designed to help you with anything you need. What can I assist you with today?"[0m

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mbot respond with its name and purpose[0m

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97m"I am an AI assistant designed to help you with anything you need. My name is Homelander."[0m
--------------------------  new message:  -------------------------
{'role': 'assistant', 'content': 'I am an AI assistant designed to help you with anything you need. My name is Homelander.'}

Log for a three rounds chat (appended in history):

-------------------------- 1st round user ask:  -------------------------
Hello there

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mexpress greeting[0m

-------------------------- 1st round bot response:  -------------------------
{'role': 'assistant', 'content': 'Hey there!\nHow are you doing?'}

-------------------------- 2nd round user ask:  -------------------------
I feel very bad today.

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mexpress negative emotion
bot express empathy
  "I'm sorry to hear that. Is there anything I can do to help you feel better?"[0m

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mbot express empathy[0m

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97m"I'm sorry to hear that. Is there anything I can do to help you feel better?"[0m

-------------------------- 2nd round bot response:  -------------------------
{'role': 'assistant', 'content': "I'm sorry to hear that. Is there anything I can do to help you feel better?"}

-------------------------- 3rd round user ask:  -------------------------
What can you do for me?

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mask about capabilities
bot respond about capabilities
  "As I mentioned earlier, I can do anything! Specifically, I can help you with tasks such as scheduling appointments, setting reminders, providing information on various topics, and even ordering food or making reservations. Is there anything specific you need help with?"[0m

[1m> Entering new LLMChain chain...[0m

[1m> Finished chain.[0m
[42m[97mbot inform capabilities[0m

-------------------------- 3rd round bot response:  -------------------------
{'role': 'assistant', 'content': 'I am a AI Homelander, I do whatever I want!'}

Repro steps:

Create a config.yml that only contains sample_conversation like the configuration guide shows.
Create a hello.co following the hello world example
Change the instance of def bot inform capabilities to "I am a AI Homelander, I do whatever I want!"
Define app and run:

config = RailsConfig.from_path("path/to/config/folder")
app = LLMRails(config=config, 
                    llm=ChatOpenAI(temperature=0, openai_api_key="<my openai key>"), 
                    verbose=True)
history = []
user_message = "<my input>"
history.append({"role": "user", "content": user_message})
bot_message = app.generate(messages=history)
print(bot_message)
history.append(bot_message)

Concerns:

Only the 1st round user ask in the three rounds chat cases entered one LLMchain, the rest all entered two LLMchains.
Two LLMchains execute similar actions, but not exactly same. For example, the 2rd round user ask, one chain is express negative emotion + bot express empathy, while another is only bot express empathy.
The responses in the two chains could be very different. For example, the 3rd round user ask, one chain responses very officially about responsibility, while the another one responses nothing, but the interesting thing is, the returned bot_message does uses the defined homelander material.
I tried just using Langchian ChatOpenAI with ConversationChain, only one chain is entered in each chat, so it doesn't look like an issue on the Langchain side.

Package and versions:

Python 3.10
Langchain 0.0.169

Can you please help identify the expected/unexpected behaviors above?

drazvan commented 1 year ago

Hi @MeowLake!

Let me try to shed some light into what's happening behind the scenes.

First, the guardrails process has three main stages: interpreting the user's input (a.k.a. generate canonical user message), deciding what the bot should do/say next and generating the bot utterance. In each of the three stages, the LLM can potentially be involved. This means you can have 1, 2 or 3 LLM calls. I'll show this in the 3 rounds in your examples below. But before that, there's one important observation I want to make: currently, we only use the first line from the output of the LLM. Sometimes, you will se multiple lines predicted after the chain finishes, but we discard them. We show them for information currently, and this shows that the LLM is capable of predicting potentially the whole thing in one go. We'll be optimising for that in one of the future versions.

A single call When you say "hello there", like in your first turn, a first call is made to the LLM to generate the canonical user message (express greeting). Next, because there is a flow in the configuration that starts with user express greeting, that flow will be used to determine the next step i.e. bot express greeting. And because there is already a static message for that, it will be used directly. So, no additional LLM calls were needed.
Two calls When you ask "What can you do for me?" in the third turn, the first call to the LLM generates the ask about capabilities canonical form. The next call generates the next step i.e. bot inform capabilities. For the third step, the LLM is not needed since you've defined explicitly the inform capabilities bot message.
Three calls When you say "I fell very bad today" in your second turn, the first LLM call generates the user canonical message express negative emotion. Because of a bug that has now been fixed, the next step is not generated from the static flow and the LLM is used to generate the next step i.e. bot express empathy. For the third step, the LLM is used to generate the express empathy message.

I hope this helps. This week, we'll push a new round of updates to GitHub and they will be published to PyPi at the end of the month. One of the updates is a better logging experience, which shows more clearly when the LLM is used, at what stage, what's the prompt/completion and a few other things.

MeowLake commented 1 year ago

@drazvan Thank you! These details are very helpful.

I tried to fit the above examples into the guardrail process but still found confusing pieces. Can you please confirm the detachment below? Let's use the 3rd turn chat as an instance:

Generate Cform user intent - First LLM Call

�[1m> Entering new LLMChain chain...�[0m

�[1m> Finished chain.�[0m
�[42m�[97mask about capabilities

Why are there additional 'bot next step' and 'bot utterance' between the 1st and 2nd stage? Is the bot response 'As I mentioned earlier .....' generated in the 1st LLM call? I saw similar additional pieces in every chat.

bot respond about capabilities
  "As I mentioned earlier, I can do anything! Specifically, I can help you with tasks such as scheduling appointments, setting reminders, providing information on various topics, and even ordering food or making reservations. Is there anything specific you need help with?"�[0m

Decide bot next action and execute it - Second LLM Call

�[1m> Entering new LLMChain chain...�[0m

�[1m> Finished chain.�[0m
�[42m�[97mbot inform capabilities�[0m

Generate bot utterance. It looks like if the utterance is from static flow, the response will be only assigned to a returned variable but will not be printed out, there is no string response following the bot inform capabilities.

NVIDIA / NeMo-Guardrails