GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview
Apache License 2.0
6.74k stars 1.81k forks source link

`intro_multimodal_rag_utils` has no error handling for safety blocks #351

Closed TMRolle closed 6 months ago

TMRolle commented 7 months ago

Relevant file: gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py

When a document is processed using get_document_metadata(), if any of the content in that document is blocked by the Gemini model due to safety filters, it will return a content object without a .text property, and will raise a ValueError("Content has no parts") if you attempt to reference that field.

It looks like this is not currently handled, resulting in get_document_metadata() failing for the whole doc if any of the content trips a safety filter. This was discovered when the safety filters were tripped by a court document referencing criminal activity (which I assume would be considered a false positive).

I'm not very familiar with that module or what use cases it's expected to cover, but I assume it's probably ideal in these cases to catch that exception and output a warning indicating that portions of the document failed to process, while continuing to process the remaining portions of the doc.

holtskinner commented 7 months ago

@TMRolle thanks for the catch!

Looks like this is the cause of the issue in #344 .

@lavinigam-gcp Can you take a look and try to find a workaround for this behavior?

lavinigam-gcp commented 6 months ago

Working on the fix as of now for this. Will add the PR that fixes this.