Gemini Flash error: None Stream removed

barun-saha commented 2 weeks ago

Description of the bug:

I'm trying out Gemini 1.5 Flash (002) API and its long context. I prompt the LLM with the contents of a few (10) large PDF files. In the first interaction, I ask it to list the titles of the documents (to verify that the file contents are available and the model can read them). This appears to work fine: the titles are listed and the total token count is reported to be about 290K.

import google.generativeai as genai

chat_session = model.start_chat(history=[])
response = await chat_session.send_message_async(
    [f'Carefully look at the {len(files)} documents provided here and list their titles.'] +  files,
    stream=True
)
async for chunk in response:
    print(chunk.text, end='')

Next, in the same chat session, I ask it to summarize the documents, as indicated in the code below:

import random
import time

review = ''
max_retries = 3

while max_retries > 0:
    try:
        response = await chat_session.send_message_async(
            [REVIEW_PROMPT.strip()],
            stream=True,
        )

        async for chunk in response:
            print('.', end='')
            review += chunk.text

        print('')
        break
    except Exception as ex:        
        print(f'*** An error occurred while receiving chat response: {ex}')
        max_retries -= 1

        import traceback
        traceback.print_exc()

        if max_retries > 0:
            wait_time = random.uniform(5, 7)
            print(f'Retrying again in {wait_time} seconds...')
            chat_session.rewind()
            time.sleep(wait_time)

However, this invocation almost always results in the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/google/api_core/grpc_helpers_async.py", line 106, in _wrapped_aiter
    async for response in self._call:  # pragma: no branch
  File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_call.py", line 365, in _fetch_stream_responses
    await self._raise_for_status()
  File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_call.py", line 272, in _raise_for_status
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "Stream removed"
    debug_error_string = "UNKNOWN:Error received from peer ipv4:<IP_ADDRESS>:443 {created_time:"2024-11-01T09:50:17.098911997+00:00", grpc_status:2, grpc_message:"Stream removed"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/ipykernel_30/2992338197.py", line 15, in <module>
    async for chunk in response:
  File "/opt/conda/lib/python3.10/site-packages/google/generativeai/types/generation_types.py", line 727, in __aiter__
    raise self._error
  File "/opt/conda/lib/python3.10/site-packages/google/generativeai/types/generation_types.py", line 736, in __aiter__
    item = await anext(self._iterator)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/google/api_core/grpc_helpers_async.py", line 109, in _wrapped_aiter
    raise exceptions.from_grpc_error(rpc_error) from rpc_error
google.api_core.exceptions.Unknown: None Stream removed

I have tried to run it by disabling streaming, but it still throws the same error. Rewinding the chat session and trying it again causes the same error.

How can I address this error and continue the chat?

Actual vs expected behavior:

The expected behavior is to receive the complete response from the model without any run-time exception.

Any other information you'd like to share?

Just to clarify, the code works on rare occasions. Also, I'm running the code on Kaggle (!pip install google-generativeai==0.8.3 grpcio-status).

manojssmk commented 2 weeks ago

Hi @barun-saha

I ran the same code on my local setup, and it worked fine. However, when I run it on Kaggle, I encounter the same error as you. This issue may be due to network instability on Kaggle's end. If possible, try running the code on a stable network or on a local setup.

Thanks

barun-saha commented 2 weeks ago

Hi Manoj,

Thanks for your inputs.

I tried running the code in a Colab notebook, using the synchronous send_message method (send_message_async leads to error on Colab, but that's a different issue.) Surprisingly, the code ran without any error! I tried out sending several chat messages and got responses back (with no error).

I was curious, so went back to Kaggle and used send_message. Unfortunately, I got the same error there with the synchronous call as well.

Therefore, I agree with your observation that this might be more of an environment-specific issue.

manojssmk commented 2 weeks ago

@barun-saha

I encountered the same error even with send_message. Kaggle handles fewer tokens well, but in your case, the 290K tokens are causing issues.

Thanks

github-actions[bot] commented 2 days ago

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

google-gemini / generative-ai-python