Closed EricWF closed 5 months ago
we don't make a stream request for tooling, we make a regular call -> translate it and return it in a streamed response.
The reason for that is that it's hard to translate across formats for tool calling streaming responses
Open to suggestions on how we can improve here. If you have a version of this, that does work across formats, would welcome a pr here! @EricWF
I am very much surprised to learn this isn't a supported use case after seeing a number of commits about "anthropic streaming tool calls"
It's not difficult to translate between the two. I did some digging into the litellm code, and it seems viable to implement.
Here's an openAI tool call chunk:
ChatCompletionChunk(
...
choices=[
Choice(
delta=ChoiceDelta(
...
tool_calls=[
ChoiceDeltaToolCall(
index=0,
id=None,
function=ChoiceDeltaToolCallFunction(
arguments='ASTIC',
name=None
),
type=None
)
]
),
...
)
],
...
)
And here's the equivalent for anthropic,
ToolsBetaContentBlockDeltaEvent(
delta=
InputJsonDelta(
partial_json='ge": "HELLO',
type='input_json_delta'
),
index=1,
type='content_block_delta'
)
The partial_json
translates directly to arguments
in the tool call.
Also the introduction of the tool call happens in almost the exact same manner. Again, here's openai
ChoiceDeltaToolCall(
index=0,
id='call_qhj5Sb80ZOruV5bbS8uCvPwg',
function=
ChoiceDeltaToolCallFunction(
arguments='',
name='yell_really_really_really_loudly'
),
type='function'
)
And from anthropic
ToolsBetaContentBlockStartEvent(
content_block=
ToolUseBlock(
id='toolu_015DeyCWQQbvdgzLdZrVyvnH',
input={},
name='yell_really_really_really_loudly',
type='tool_use'
),
index=1,
type='content_block_start'
)
As you can see, there's a pretty direct mapping from the streaming responses of anthropic to the streaming responses of openai.
At the moment however, I don't have cycles to implement it myself, at least not on top of the existing complexity in litellm. I picked up litellm so I wouldn't have to implement it myself :-(
That's a fair point @EricWF - i remember looking at this when anthropic was returning xml and choosing to wait for the complete response, worth revisiting with their new format.
Curious - Aren't you still rebuilding the chunks to form a complete tool call?
Why doesn't the existing implementation solve your problem
Ah, so I've built a chat TUI, and for certain tool calls, save_file
for example, I stream the contents of the file, with syntax highlighting as they arrive. I have to do some funky tricks to invent valid json from the partial response, but it works rather nicely.
Even when I can't pretty-print, it's still a better experience to see the JSON arrive, In the end I need to reconstruct the full tool call to actually call it, but the responsiveness of live printing is worth the effort.
Additionally, because I'm passing the tools to every message, the limitation prevents the streaming of non-tool messages (IIRC, I may be incorrect in that).
Thanks for taking the time to discuss this further.
also was debugging this discrepancy today and was surprised that the anthropic + tools use-case was not streaming.
my use-case is to construct pydantic objects from the tool response in a streaming manner (like how it's done here https://github.com/jxnl/instructor) and stream that response from the api to my frontend
Got it - i'm assuming both of you show the response as it's coming in, to your users? @EricWF @azgo14
yup
@krrishdholakia Indeed.
Not sure if it's related, but I can confirm that in 1.40.0, even "regular" (non-tooling) streaming requests don't work with Anthropic like they used to: litellm
appears to wait for the entire response, then send all the chunks back at once. Nothing more to add to @EricWF's message, other than to say that if I, too, manually adjust the cURL request being made to include stream: true
, everything works as expected.
how do you adjust the curl? @bachya
i'll try to repro with a fix by tomorrow
how do you adjust the curl? @bachya
Same as @EricWF: I'm taking the cURL request from the logs and manually re-running it with that parameter in place.
@bachya found the issue - it wasn't passing the 'stream' param in the async call to the httpx client.
Fixed it - https://github.com/BerriAI/litellm/commit/5e12307a48bb21c5ec308899a87247bd6a4a78cd
should be live soon in v1.40.1
@bachya found the issue - it wasn't passing the 'stream' param in the async call to the httpx client.
Fixed it - https://github.com/BerriAI/litellm/commit/5e12307a48bb21c5ec308899a87247bd6a4a78cd
should be live soon in v
1.40.1
Appreciate you, @krrishdholakia!
@krrishdholakia Any timing on when the new release will be cut?
@krrishdholakia I've noticed we're now on 1.40.3 and https://github.com/BerriAI/litellm/commit/5e12307a48bb21c5ec308899a87247bd6a4a78cd doesn't appear to be included; intended?
hey @bachya i see it included - see the tag
i also see it live on main https://github.com/BerriAI/litellm/blob/94e42dd06342b7e8a8669621ab6d1bd171cb478d/litellm/llms/anthropic.py#L164
are you still seeing this?
@krrishdholakia Ahh, missed that: GitHub hid the little release breadcrumb bar. Just tested 1.40.1 and it worked great; thank you!
Great! closing ticket then
What happened?
It appears the library fails to pass the "stream" parameter to Anthropic when creating streaming messages with tooling.
The attached log comes from running the
test_acompletion_claude_3_function_call_with_streaming
function.Notice the stream parameter is not passed within the CURL command. As such, the response does not stream. As a result, the response isn't a streaming response, and so streaming tool calls do not work.
Manually re-running the curl command with '"stream": true' appended to the end of the payload corrects the issue.
A little digging suggests that litellm is not setup to handle streaming function calls at all, given the strings "input_json_delta" and "partial_json" are present in the anthropic streaming output, but are found nowhere inside the litellm source code.
In the meantime, it might be worth updating the documentation to reflect the lack of support, and making the
test_acompletion_claude_3_function_call_with_streaming
test fail due to failing to stream the response.Relevant log output
Twitter / LinkedIn details
No response