503 The service is currently unavailable when using Context caching Feature

okada1220 commented 2 months ago

Description of the bug:

I'm trying to create a cache by reading the contents of multiple PDF files, but when the total number of tokens within the files exceeds approximately 500,000 tokens, I receive a 503 error (Service Unavailable) from Google API Core.

It seems that the error isn't returning immediately, but rather after about 40 to 50 seconds. This might indicate that a timeout is occurring in Google API Core.

Code

import google.generativeai as genai
import os

gemini_api_key = os.environ.get("GEMINI_API_KEY")
genai.configure(api_key=gemini_api_key)

documents = []
file_list = ["xxx.pdf", "yyy.pdf", ...]
for file in file_list:
  gemini_file = genai.upload_file(path=file, display_name=file)
  documents.append(gemini_file)

gemini_client = genai.GenerativeModel("models/gemini-1.5-flash-001")
total_token = gemini_client.count_tokens(documents).total_tokens)
print(f"total_token: {total_token}")
# total_token: 592403

gemini_cache = genai.caching.CachedContent.create(model=“models/gemini-1.5-flash-001”, display_name=“sample”, contents=documents)

Version

Python 3.9.19
google==3.0.0
google-ai-generativelanguage==0.6.6
google-api-core==2.19.0
google-api-python-client==2.105.0
google-auth==2.29.0
google-auth-httplib2==0.2.0
google-generativeai==0.7.2
googleapis-common-protos==1.63.0

Actual vs expected behavior:

Actual behavior

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1176, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "The service is currently unavailable."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:172.217.175.234:443 {created_time:"2024-08-06T13:37:03.077186006+09:00", grpc_status:14, grpc_message:"The service is currently unavailable."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/google/generativeai/caching.py", line 219, in create
    response = client.create_cached_content(request)
  File "/usr/local/lib/python3.9/site-packages/google/ai/generativelanguage_v1beta/services/cache_service/client.py", line 874, in create_cached_content
    response = rpc(
  File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

Expected behavior

gemini_cache = genai.caching.CachedContent.create(model="models/gemini-1.5-flash-001", display_name="sample", contents=documents)
print(gemini_cache)

# CachedContent(
#     name='cachedContents/l5ataay9naq2',
#     model='models/gemini-1.5-flash-001',
#     display_name='sample',
#     usage_metadata={
#         'total_token_count': 592403,
#     },
#     create_time=2024-08-08 01:21:44.925021+00:00,
#     update_time=2024-08-08 01:21:44.925021+00:00,
#     expire_time=2024-08-08 02:21:43.787890+00:00
# )

Any other information you'd like to share?

https://ai.google.dev/gemini-api/docs/caching?lang=python#considerations

The minimum input token count for context caching is 32,768, and the maximum is the same as the maximum for the given model. (For more on counting tokens, see the Token guide).

Upon reviewing the Gemini API documentation, I noticed an interesting mismatch regarding token limits. While the maximum token count is described as being dependent on the specific model in use. In my case, I'm utilizing the models/gemini-1.5-flash-001 model, which has a maximum input token limit of 1,048,576. Based on this information, I initially assumed that processing around 500,000 tokens should be working without any issues.

Moreover, I was able to successfully generate the cache even with token counts exceeding 800,000 when attempting to create a cache using a string. This leads me to suspect that there might be a bug specifically related to creating cache files with high token counts, as opposed to string-based caching.

gurugecl commented 2 months ago

Im experiencing the same issue even when using models/gemini-1.5-pro-001 and trying to cache roughly 300k tokens even though it has an input token limit of 2,097,152

singhniraj08 commented 2 months ago

@okada1220,

Thank you reporting this issue. This looks like an intermittent error and should work now. Automatic retry logic is added to SDK to avoid these errors and you can follow #502 FR for examples on retry logic. Thanks

okada1220 commented 2 months ago

@singhniraj08 Thank you for your response.

I checked again, and it seems that the same error is still occurring...

I looked at the retry logic example in #502, which seems to apply when using request_options withgenerate_content. But since I’m using genai.caching.CachedContent.create, which doesn’t have request_options, I’m wondering if this retry logic is still applicable here. Do you think this approach will work in my case?

nate-walter commented 6 hours ago

I'm receiving this error too

google-gemini / generative-ai-python