sam-trost commented 3 weeks ago

When the agent includes tool calls, the existing chat history mechanism is including the tool responses in subsequent calls to the LLM. The LLM is rejecting the request because the output message from the tool call is not properly formatted for use as an input message.

To reproduce, run the simple-app from the samples. First turn: Ask for today's date. ..Expect AI Response.. Second turn: Ask for the current time.

No response is played. See the below logs.

xrx-orchestrator  | 
xrx-orchestrator  | 
xrx-orchestrator  | [13:49:56 UTC] DEBUG: Agent response: {"messages":[{"content":null,"role":"assistant","tool_calls":[{"id":"call_900x","function":{"arguments":"{}","name":"get_current_time"},"type":"function"}]},{"tool_call_id":"call_900x","role":"tool","name":"get_current_time","content":"2024-10-19 13:49:56"},{"role":"assistant","content":"Today's date is October nineteenth, two thousand twenty-four."}],"node":"CustomerResponse","output":"Today's date is October nineteenth, two thousand twenty-four.","session":{"guid":"785073c4-b71b-47d9-aefa-b14219da584a"}}
xrx-orchestrator  | [13:49:56 UTC] DEBUG: Handling agent response of type CustomerResponse: Today's date is October nineteenth, two thousand twenty-four., modality: audio
xrx-orchestrator  | [13:49:56 UTC] DEBUG: Received from Agent:Today's date is October nineteenth, two thousand twenty-four.
xrx-orchestrator  | [13:49:56 UTC] DEBUG: Sending to TTS:Today's date is October nineteenth, two thousand twenty-four.
xrx-tts           | INFO:httpx:HTTP Request: POST https://api.openai.com/v1/audio/speech "HTTP/1.1 200 OK"
xrx-tts           | INFO:openai_tts:Cached synthesized audio to cache/71012942d6afe02a5d583ef121ab7932.pcm
xrx-tts           | INFO:openai_tts:Finished synthesizing speech. Total chunks: 15
xrx-orchestrator  | [13:49:58 UTC] DEBUG: Received from TTS: {"action":"done"}
xrx-orchestrator  | [13:49:58 UTC] DEBUG: TTS done, sending cached agent responses
xrx-stt           | INFO:groq_stt:Transcribing audio using Groq API...
xrx-stt           | INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/audio/transcriptions "HTTP/1.1 200 OK"
xrx-stt           | INFO:groq_stt:-----
xrx-stt           | INFO:groq_stt:Transcription(text=" What's the current time?", task='transcribe', language='English', duration=1.02, segments=[{'id': 0, 'seek': 0, 'start': 0, 'end': 1.04, 'text': " What's the current time?", 'tokens': [50365, 708, 311, 264, 2190, 565, 30, 50417], 'temperature': 0, 'avg_logprob': -0.284207, 'compression_ratio': 0.75, 'no_speech_prob': 0.02665573}], x_groq={'id': 'req_01jajgnr2jef98qcez05brypfx'})
xrx-tts           | INFO:main:Received action: cancel
xrx-tts           | INFO:main:Received cancel action
xrx-reasoning     | INFO:     172.19.0.7:51190 - "POST /run-reasoning-agent HTTP/1.1" 422 Unprocessable Entity
xrx-orchestrator  | [13:50:06 UTC] DEBUG: ---> Sending audio to STT
xrx-orchestrator  | [13:50:06 UTC] DEBUG: ---> Sent audio to STT of size 32768
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Received from STT:  What's the current time?
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Received from STT:  What's the current time?
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Cancelling all agent activity
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Sending to agent:What's the current time?
xrx-orchestrator  | [13:50:06 UTC] DEBUG: send chat history to agent
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Sending to agent xrx-reasoning:8003/run-reasoning-agent: [{"role":"user","content":"What day is today?"},{"content":null,"role":"assistant","tool_calls":[{"id":"call_900x","function":{"arguments":"{}","name":"get_current_time"},"type":"function"}]},{"tool_call_id":"call_900x","role":"tool","name":"get_current_time","content":"2024-10-19 13:49:56"},{"role":"assistant","content":"Today's date is October nineteenth, two thousand twenty-four."},{"role":"user","content":"What's the current time?"}]
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Sending to agent: {"guid":"785073c4-b71b-47d9-aefa-b14219da584a"}
xrx-orchestrator  | [13:50:06 UTC] DEBUG: Raw output: {"detail":[{"type":"string_type","loc":["body","messages",1,"content"],"msg":"Input should be a valid string","input":null}]}

Note the assistant response is included in the second turn.

xrx-orchestrator  | [13:50:06 UTC] DEBUG: Sending to agent xrx-reasoning:8003/run-reasoning-agent: [{"role":"user","content":"What day is today?"},{"content":null,"role":"assistant","tool_calls":[{"id":"call_900x","function":{"arguments":"{}","name":"get_current_time"},"type":"function"}]},{"tool_call_id":"call_900x","role":"tool","name":"get_current_time","content":"2024-10-19 13:49:56"},{"role":"assistant","content":"Today's date is October nineteenth, two thousand twenty-four."},{"role":"user","content":"What's the current time?"}]

Specifically these two messages look like they should be omitted:

{"content":null,"role":"assistant","tool_calls":[{"id":"call_900x","function":{"arguments":"{}","name":"get_current_time"},"type":"function"}]},
{"tool_call_id":"call_900x","role":"tool","name":"get_current_time","content":"2024-10-19 13:49:56"}

mprast commented 2 weeks ago

@sam-trost thanks for looking into this and contributing a fix!

What's happening here is that we're auto-generating APIs on the agent side using FastAPI. FastAPI generates an API from a python function, and it does input validation based on the Pydantic types we use for the arguments to the function. Currently the Message type declares content as a non-nullable string - hence the validation failures when content is null

I'm hesitant to mess around with the messages array because I think it's confusing for the agent to have a different accounting of the message history than the orchestrator and the client. My preferred approach is to widen the Pydantic types to accommodate tool call messages and null content bodies. I'll open a separate PR with those changes

mprast commented 2 weeks ago

8090-inc / xrx-core

Subsequent turns fail after a tool is used #22

23