[Bug]: InactiveRPCError when sending base64 encoded mp3 data

bent-verbiage commented 6 months ago

File Name

gemini/getting-started/intro_gemini_1_5_pro.ipynb

What happened?

The below is mentioned in the notebook for Gemini 1.5 Pro, but unfortunately all examples are for Google Cloud hosted files.

Audio understanding
Gemini 1.5 Pro can directly process audio for long-context understanding.
audio_file_path = "cloud-samples-data/generative-ai/audio/pixel.mp3"

Please Note: The code below works for me when I use Part.from_uri, so authentication and everything is working, but fails with an InactiveRpcError (see log output) when used with a local file encoded to base64 and Part.from_data.

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below line
project_id = PROJECT_ID

vertexai.init(project=project_id, location="us-central1")

model = GenerativeModel("gemini-1.5-pro-preview-0409")

prompt = """
  Please provide a summary for the audio.
  Provide chapter titles with timestamps, be concise and short, no need to provide chapter summaries.
  Do not make up any information that is not part of the audio and do not be verbose.
"""

audio_file_uri = "test.mp3"
with open(audio_file_uri, 'rb') as file:
        binary_data = file.read()
        base64_encoded_data = base64.b64encode(binary_data)

# audio_file_uri = "gs://cloud-samples-data/generative-ai/audio/pixel.mp3"
# audio_file = Part.from_uri(audio_file_uri, mime_type="audio/mpeg")
audio_file = Part.from_data(base64_encoded_data, "audio/mpeg")

contents = [audio_file, prompt]
response = model.generate_content(contents)
print(response.text)

I did make sure that I'm on the latest version. I have also tried changing the mime_type to "audio/mp3" but wit the same results. Help?

Relevant log output

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
File ~/anaconda3/envs/llm-latest-packages/lib/python3.10/site-packages/google/api_core/grpc_helpers.py:72, in _wrap_unary_errors.<locals>.error_remapped_callable(*args, **kwargs)
     71 try:
---> 72     return callable_(*args, **kwargs)
     73 except grpc.RpcError as exc:

File ~/anaconda3/envs/llm-latest-packages/lib/python3.10/site-packages/grpc/_channel.py:1161, in _UnaryUnaryMultiCallable.__call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
   1155 (
   1156     state,
   1157     call,
   1158 ) = self._blocking(
   1159     request, timeout, metadata, credentials, wait_for_ready, compression
   1160 )
-> 1161 return _end_unary_response_blocking(state, call, False, None)

File ~/anaconda3/envs/llm-latest-packages/lib/python3.10/site-packages/grpc/_channel.py:1004, in _end_unary_response_blocking(state, call, with_call, deadline)
   1003 else:
-> 1004     raise _InactiveRpcError(state)

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INTERNAL
    details = "Internal error encountered."
    debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B2404:6800:4006:80a::200a%5D:443 {created_time:"2024-04-22T18:12:49.452962+10:00", grpc_status:13, grpc_message:"Internal error encountered."}"

Code of Conduct

[X] I agree to follow this project's Code of Conduct

gericdong commented 6 months ago

@bent-verbiage you will need to encode the base64_encoded_data in string before passing it to the Part. from_data(), for example: encoded_string = base64_encoded_data.decode('utf-8') ... audio_file = Part.from_data(encoded_string, "audio/mpeg")

Please try. Thanks.

bent-verbiage commented 6 months ago

Thanks @gericdong , that helped: it works now.

GoogleCloudPlatform / generative-ai