anthropics / anthropic-sdk-python

MIT License
1.45k stars 172 forks source link

How to handle Anthropic 'Overloaded' error when streaming in langchain #688

Open Kevin-McIsaac opened 2 weeks ago

Kevin-McIsaac commented 2 weeks ago

I'm building a RAG creates the Langchain chain:

prompt = ChatPromptTemplate.from_template(template)
model =ChatAnthropic(model=model_version,temperature=0, max_tokens=1024, timeout=None, max_retries=3, )
chain = prompt | model | StrOutputParser()

which I then streamed to a streamlit UI using

LLM_stream = chain.stream
response:str = ""
for chunk in LLM_stream:
    ....

Mostly this work great, however some times, perhaps 1 in 100 questions to the RAG, I get the error:

anthropic.APIStatusError: {'type': 'error', 'error': {'details': None, 'type': 'overloaded_error', 'message': 'Overloaded'}}

Any thoughts on dealing with this, e.g., I create the max_retries or add a time out.

Logs

File "/home/kmcisaac/Projects/policy_pal/chat/chat_app.py", line 146, in generate_answer for chunk in LLM_stream: │ └ <generator object RunnableSequence.stream at 0x7f4386fc2c50> └ 'd for a non-'

File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3407, in stream yield from self.transform(iter([input]), config, kwargs) │ │ │ │ └ {} │ │ │ └ None │ │ └ {'question': 'What is the minimum casual employment required for a non-LMI loan?', 'level_of_detail': 'Be concise. Proved a o... │ └ <function RunnableSequence.transform at 0x7f43a2b89bc0> └ ChatPromptTemplate(input_variables=['context', 'level_of_detail', 'question'], input_types={}, partial_variables={}, messages... File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3394, in transform yield from self._transform_stream_with_config( │ └ <function Runnable._transform_stream_with_config at 0x7f43a2b88400> └ ChatPromptTemplate(input_variables=['context', 'level_of_detail', 'question'], input_types={}, partial_variables={}, messages... File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2197, in _transform_stream_with_config chunk: Output = context.run(next, iterator) # type: ignore │ │ │ └ <generator object RunnableSequence._transform at 0x7f43873579c0> │ │ └ <method 'run' of '_contextvars.Context' objects> │ └ <_contextvars.Context object at 0x7f43767e0ec0> └ 'd for a non-' File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 3357, in _transform yield from final_pipeline └ <generator object BaseTransformOutputParser.transform at 0x7f43a34beb40> File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/output_parsers/transform.py", line 64, in transform yield from self._transform_stream_with_config( │ └ <function Runnable._transform_stream_with_config at 0x7f43a2b88400> └ StrOutputParser() File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2197, in _transform_stream_with_config chunk: Output = context.run(next, iterator) # type: ignore │ │ │ └ <generator object BaseTransformOutputParser._transform at 0x7f43863f8220> │ │ └ <method 'run' of '_contextvars.Context' objects> │ └ <_contextvars.Context object at 0x7f43862b5dc0> └ 'd for a non-' File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/output_parsers/transform.py", line 29, in _transform for chunk in input: │ └ <itertools._tee object at 0x7f4386fa2600> └ AIMessageChunk(content='d for a non-', additional_kwargs={}, response_metadata={}, id='run-d6320e6f-f285-42e5-9c0e-6ad6b44204... File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1431, in transform yield from self.stream(final, config, kwargs) │ │ │ │ └ {} │ │ │ └ {'tags': [], 'metadata': {}, 'callbacks': <langchain_core.callbacks.manager.CallbackManager object at 0x7f4384540c10>, 'recur... │ │ └ ChatPromptValue(messages=[HumanMessage(content="You are a mortgage broker research assistant. Read the following policy conte... │ └ <function BaseChatModel.stream at 0x7f4387b51e40> └ ChatAnthropic(model='claude-3-haiku-20240307', temperature=0.0, max_retries=3, anthropic_api_url='https://api.anthropic.com',... File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 420, in stream raise e File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 400, in stream for chunk in self._stream(messages, stop=stop, **kwargs): │ │ │ │ │ └ {} │ │ │ │ └ None │ │ │ └ [HumanMessage(content="You are a mortgage broker research assistant. Read the following policy context very carefully and use... │ │ └ <function ChatAnthropic._stream at 0x7f43877f8a40> │ └ ChatAnthropic(model='claude-3-haiku-20240307', temperature=0.0, max_retries=3, anthropic_api_url='https://api.anthropic.com',... └ ChatGenerationChunk(text='d for a non-', message=AIMessageChunk(content='d for a non-', additional_kwargs={}, response_metada... File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/langchain_anthropic/chat_models.py", line 715, in _stream for event in stream: │ └ <anthropic.Stream object at 0x7f438453b3d0> └ RawContentBlockDeltaEvent(delta=TextDelta(text='d for a non-', type='text_delta'), index=0, type='content_block_delta') File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/anthropic/_streaming.py", line 68, in iter for item in self._iterator: │ │ └ <generator object Stream.stream at 0x7f438451c040> │ └ <anthropic.Stream object at 0x7f438453b3d0> └ RawContentBlockDeltaEvent(delta=TextDelta(text='d for a non-', type='text_delta'), index=0, type='content_block_delta') File "/home/kmcisaac/Projects/policy_pal/.venv/lib/python3.11/site-packages/anthropic/_streaming.py", line 110, in stream raise self._client._make_status_error( │ │ └ <function Anthropic._make_status_error at 0x7f438776e2a0> │ └ <anthropic.Anthropic object at 0x7f4386080b90> └ <anthropic.Stream object at 0x7f438453b3d0>

anthropic.APIStatusError: {'type': 'error', 'error': {'details': None, 'type': 'overloaded_error', 'message': 'Overloaded'}} 2024-10-11 09:02:37.734 | INFO | page:main:361 - Sync answers in 2.0s costing 0.00¢

onel commented 1 week ago

I encountered the same problem.

I think the issue here is that the API is a bit inconsistent for streaming messages. Every message that comes you can read the type

for message in response:
    if message.type == 'message_start':

    elif message.type == 'content_block_delta':

    elif message.type == 'error':

etc

The problem is that message.type == 'error ' never happens. The code mentions that an error is raised https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_streaming.py#L110

I would expect a message of type error to also be sent as for the rest.

The README gives an example of wrapping the .generate() call and catching that error but nothing is mentioned for the streaming case.