Open holdenmatt opened 2 months ago
Hi @holdenmatt, unfortunately this is a model limitation (same issue noted in https://github.com/anthropics/anthropic-sdk-typescript/issues/454#issuecomment-2221073472). We're planning on improving this with future models.
I see, thanks. If I want faster streaming, would you recommend I move away from tools and try to coax a JSON schema via the system prompt instead?
Hi @holdenmatt -- one clarification to the above: we stream out each key/value pair together, so long values will result in buffering (the delays you're seeing). In the example you provided, Claude is producing a poem (a long string) as a value, which is why you're seeing the delay. However, a large object with many smaller keys/values wouldn't have this issue.
If I want faster streaming, would you recommend I move away from tools and try to coax a JSON schema via the system prompt instead?
That could work, this delay you're seeing should only be happening in that specific kind of tool use (where Claude is producing long keys/values).
Ah, that would explain why I run into this but other folks I talk to haven't seen it.
The specific use case for me is generating LaTeX code from text prompts for https://texsandbox.com/
The latex output could be long, depending on the prompt. The reason I use function calling instead of text completion is I want to allow the model to "branch" between the good "latex" case and an "error" case if it doesn't know what to do, or eg the input prompt doesn't make sense.
I could avoid tools here if that would improve streaming, but I'd need some other way to signal "this is valid code" vs "this is an error message"
fyi - I fixed this by moving away from tool calling, and streaming now feels fast again.
I hacked my own poor man's function calling on text generation, by prompting the model to write
This works fine (so you can close this if you like) but it was the biggest issue I ran into switching from gpt-4o to claude-3.5-sonnet. I quite often use functions/tools with long JSON values, so feature request to improve this in the future. Thanks!
Is there an issue we can track for improvements to streaming + tool use, or do you plan to post updates here?
Hey team, is there a planned date for fixing this? This is a big limiter for our user experience for code-gen. Since the result is returned as a stream anyway, is there a way to get those delta earlier?
+1, think this basically makes tool use not viable for our use case - not limited to the typescript API, also a problem in python
+1, think this basically makes tool use not viable for our use case - not limited to the typescript API, also a problem in python
If this helps, there's a hacky workaround similar to the solution mentioned above that's currently working for me and someone else by streaming raw text and forcing a JSON format. Then progressively resolve the text into the partial object. It's surprisingly reliable so far.
https://github.com/vercel/ai/issues/3422#issuecomment-2450459211
(Sorry if this isn't the right place to report this, I wasn't sure).
I'm trying to switch from gpt-4o to claude-3.5-sonnet in an app I'm building, but high streaming tool latency is preventing me from doing so. Looks like this was discussed in #454 but wondering how I should proceed?
The total latency of Claude vs gpt-4o is pretty similar, and I think fine.
The issue is that Claude waits a long time before any content is streamed (I often see ~5s delays vs ~500ms for gpt-4o). This is a poor user experience in my app, because users get no feedback that any generation is happening. This will prevent me from switching, even though I much prefer Claude's output quality!
Do you have any plans to fix this? Or do you recommend not using tools + streaming with Claude?
Example timing and test code below, if helpful.
Timing comparison
claude-3-5-sonnet
: Stream created at 0ms First content received: 4645ms Streaming time: 46ms Total time: 4691msgpt-4o
: Stream created at 343ms First content received: 368ms Streaming time: 2100ms Total time: 2468msTest code: