Closed TMRolle closed 6 months ago
@TMRolle thanks for the catch!
Looks like this is the cause of the issue in #344 .
@lavinigam-gcp Can you take a look and try to find a workaround for this behavior?
Working on the fix as of now for this. Will add the PR that fixes this.
Relevant file:
gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py
When a document is processed using
get_document_metadata()
, if any of the content in that document is blocked by the Gemini model due to safety filters, it will return a content object without a.text
property, and will raise aValueError("Content has no parts")
if you attempt to reference that field.It looks like this is not currently handled, resulting in
get_document_metadata()
failing for the whole doc if any of the content trips a safety filter. This was discovered when the safety filters were tripped by a court document referencing criminal activity (which I assume would be considered a false positive).I'm not very familiar with that module or what use cases it's expected to cover, but I assume it's probably ideal in these cases to catch that exception and output a warning indicating that portions of the document failed to process, while continuing to process the remaining portions of the doc.