cohere-ai / cohere-python

Python Library for Accessing the Cohere API
https://docs.cohere.ai
MIT License
305 stars 66 forks source link

re: Exception ignored in: <generator object BaseCohere.chat_stream at 0x1541109e0> #578

Closed cleesmith closed 1 month ago

cleesmith commented 1 month ago

SDK Version (required)

... using:
- macOS Sonona 14.6
- python 3.10.14
- pip show cohere
Name: cohere
Version: 5.9.1
Summary: 
Home-page: 
Author: 
Author-email: 
License: 
Location: /opt/miniconda3/envs/clsProxy/lib/python3.10/site-packages
Requires: boto3, fastavro, httpx, httpx-sse, parameterized, pydantic, pydantic-core, requests, tokenizers, types-requests, typing_extensions
Required-by: 

Describe the bug I see in the Terminal: Exception ignored in: <generator object BaseCohere.chat_stream at 0x1541109e0> ... everytime I click a button to abort the stream before it's done, which is required as sometimes models go a bit crazy and won't stop responding or respond with obvious nonsense.

I think there's a bug in chat_stream's handling of GeneratorExit or ???

The following code works properly for: Anthropic, Google/Gemini, Groq, LMStudio, Mistral, Ollama, OpenAI, OpenRouter, Perplexity, but not Cohere ...

async def CohereResponseStreamer(prompt):
    if MODEL is None: yield ""; return # sometimes model list is empty
    try:
        co = cohere.Client(
            api_key=os.environ["CO_API_KEY"], 
            timeout=30
        )
        params = {
            "model": MODEL,
            "message": prompt,
        }
        if TEMP is not None:
            params["temperature"] = TEMP
        safety_unsupported_models = ["command", "command-r-03-2024", "command-r", "command-light", "command-light-nightly", "c4ai-aya-23-35b"]
        if MODEL not in safety_unsupported_models:
            params["safety_mode"] = "NONE"

        # try:
        stream = co.chat_stream(**params)
        for chunk in stream:
            if ABORT:
                set_abort(False)
                yield f"\n... response stopped by button click."
                stream.close()  # properly close the generator
                break  # exit the generator cleanly
            if chunk.event_type == "text-generation":
                content = chunk.text
                if isinstance(content, str):
                    cleaned_content = content.replace("**", "")  # no ugly Markdown in plain text
                    yield cleaned_content
                else:
                    yield ""
        # except GeneratorExit:
        #   print("CohereResponseStreamer: GeneratorExit caught, closing stream.")
        #   stream.close()  # ensure the generator is closed even on GeneratorExit
        #   return  # exit gracefully after handling GeneratorExit
        # except Exception as e:
        #   print(f"CohereResponseStreamer exception:\n{e}\nco:\n{co}\n")
        #   stream.close()  # close the stream on other exceptions too
        #   yield f"Error:\nCohere's response for model: {MODEL}\n{e}"

    except Exception as e:
        yield f"Error:\nCohere's response for model: {MODEL}\n{e}"
... the error happens with and without the try-except's.

In the Terminal:
Exception ignored in: <generator object BaseCohere.chat_stream at 0x1541109e0>
Traceback (most recent call last):
  File "/Users/cleesmith/clsProxy/aidetour_simple_chat.py", line 758, in run_streamer
    async for chunk in streamer_function(prompt):
RuntimeError: generator ignored GeneratorExit

Screenshots See attached screenshot too. cohere_bug_generator

cleesmith commented 1 month ago

I tried the 5.9.2, 5.9.3, 5.94 upgrades and the generator runtime error still occurs. This code seems to be getting close:

async def CohereResponseStreamer(prompt):
    if MODEL is None: yield ""; return # sometimes model list is empty
    try:
        co = cohere.Client(
            api_key=os.environ["CO_API_KEY"], 
            timeout=30
        )
        params = {
            "model": MODEL,
            "message": prompt,
        }
        if TEMP is not None:
            params["temperature"] = TEMP
        safety_unsupported_models = ["command", "command-r-03-2024", "command-r", "command-light", "command-light-nightly", "c4ai-aya-23-35b"]
        if MODEL not in safety_unsupported_models:
            params["safety_mode"] = "NONE"
        # params["accepts"] = "text/event-stream" # doesn't help generator issue
        try:
            stream = co.chat_stream(**params)
            for chunk in stream:
                if ABORT:
                    set_abort(False)
                    yield f"\n... response stopped by button click."
                    stream.close()
                    return  # exit the generator cleanly?
                if chunk.event_type == "text-generation":
                    content = chunk.text
                    if isinstance(content, str):
                        cleaned_content = content.replace("**", "")
                        yield cleaned_content
                    else:
                        yield ""
        except GeneratorExit:
            return
        except RuntimeError:
            pass

I suppose the only option left is to use the Cohere models via an API key with OpenRouter ... any other ideas?

It might help to review the code here: https://github.com/openai/openai-python/blob/6172976b16821b24194a05e3e3fe5cb2342a2b4b/src/openai/lib/streaming/chat/_completions.py#L121 ... notice their comment:

This context manager ensures the response cannot be leaked if you don't read
    the stream to completion.

... as currently the cohere sdk for python is leaking.

billytrend-cohere commented 1 month ago

Hey @cleesmith, many thanks for this detailed report. I have shared this with our SDK generation vendor so they can upstream a fix. Please track this issue! https://github.com/fern-api/fern/issues/4817