google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.53k stars 306 forks source link

503 The service is currently unavailable when using Context caching Feature #500

Open okada1220 opened 2 months ago

okada1220 commented 2 months ago

Description of the bug:

I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

Code

import google.generativeai as genai
import os

gemini_api_key = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

documents = []
file_list = ["xxx.pdf", "yyy.pdf", ...]
for file in file_list:
  gemini_file = genai.upload_file(path=file, display_name=file)
  documents.append(gemini_file)

gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001")
total_token = gemini_client.count_tokens(documents).total_tokens)
print(f"total_token: {total_token}")
# total_token: 592403

gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents)

Version

Actual vs expected behavior:

Actual behavior

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "The service is currently unavailable."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
  File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content
    response = rpc(
  File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Expected behavior

gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents)
print(gemini_cache)

# CachedContent(
#     name='cachedContents/l5ataay9naq2',
#     model='models/gemini-1.5-flash-001',
#     display_name='sample',
#     usage_metadata={
#         'total_token_count': 592403,
#     },
#     create_time=2024-08-08 01:21:44.925021+00:00,
#     update_time=2024-08-08 01:21:44.925021+00:00,
#     expire_time=2024-08-08 02:21:43.787890+00:00
# )

Any other information you'd like to share?

Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the models/gemini-1.5-flash-001 model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.

Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

gurugecl commented 2 months ago

Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152

singhniraj08 commented 2 months ago

@okada1220,

Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow #502 FR for examples on retry logic. Thanks

okada1220 commented 2 months ago

@singhniraj08 Thank you for your response.

I checked again, and it seems that the same error is still occurring...

I looked at the retry logic example in #502, which seems to apply when using request_options withgenerate_content. But since I’m using genai.caching.CachedContent.create, which doesn’t have request_options, I’m wondering if this retry logic is still applicable here. Do you think this approach will work in my case?

nate-walter commented 6 hours ago

I'm receiving this error too