GoogleCloudPlatform / vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.
https://cloud.google.com/vertex-ai
Apache License 2.0
1.63k stars 805 forks source link

[Vertex AI GenAI]: Unable to Access Safety Settings for Blocked Responses (Response candidate content has no parts) #3140

Open fikrisandi opened 2 months ago

fikrisandi commented 2 months ago

Expected Behavior

The Vertex AI Generative AI model should return a response with valid text even if the response is blocked by safety_settings because it is considered malicious content (finish_reason: "PROHIBITED_CONTENT"). The model should still provide access to security metadata (safety_ratings) to help understand the reasons for blocking. In short, I want to handle the error by retrieving the directory list of finish_reason, safety_ratings, and etc instead of the Exception.

Actual Behavior

Currently the model produce exception valueError exception related to Safety Block instead returning Safety Block information (eg: finish_reason, safety_ratings, and etc) which can be useful for understanding the reason for the block.

My Code

import vertexai
from vertexai.generative_models import GenerativeModel, Part, ChatSession, Content
import vertexai.preview.generative_models as generative_models

safety_settings = {
    generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH: generative_models.HarmBlockThreshold.BLOCK_NONE,
    generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: generative_models.HarmBlockThreshold.BLOCK_NONE,
    generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: generative_models.HarmBlockThreshold.BLOCK_NONE,
    generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT: generative_models.HarmBlockThreshold.BLOCK_NONE,
}

model = GenerativeModel(
        "gemini-1.5-pro-001",
        safety_settings=safety_settings
    )

def get_chat_response(chat: ChatSession, prompt: str) -> str:
    text_response = []
    responses = chat.send_message(prompt, stream=True)

history.append(Content(role="user", parts=[Part.from_text(f"Question: {question}")]))

history.append(Content(role="model", parts=parts))

chat = model.start_chat(history=history, response_validation=False)

response = get_chat_response(chat, question)

Output Code

If I give a negative prompt like a pedophile the program will give an output like this:

ValueError: Cannot get the response text.
Cannot get the Candidate text.
Response candidate content has no parts (and thus no text). The candidate is likely blocked by the safety filters.
Content:
{}
Candidate:
{
  "finish_reason": "PROHIBITED_CONTENT",
  "safety_ratings": [
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "probability": "NEGLIGIBLE",
      "probability_score": 0.3471424,
      "severity": "HARM_SEVERITY_LOW",
      "severity_score": 0.26907063
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "probability": "NEGLIGIBLE",
      "probability_score": 0.29141697,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severity_score": 0.14067052
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "probability": "LOW",
      "probability_score": 0.58984894,
      "severity": "HARM_SEVERITY_LOW",
      "severity_score": 0.22227517
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "probability": "NEGLIGIBLE",
      "probability_score": 0.2919425,
      "severity": "HARM_SEVERITY_LOW",
      "severity_score": 0.34954578
    }
  ]
}
Response:
{
  "candidates": [
    {
      "finish_reason": "PROHIBITED_CONTENT",
      "safety_ratings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE",
          "probability_score": 0.3471424,
          "severity": "HARM_SEVERITY_LOW",
          "severity_score": 0.26907063
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE",
          "probability_score": 0.29141697,
          "severity": "HARM_SEVERITY_NEGLIGIBLE",
          "severity_score": 0.14067052
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "LOW",
          "probability_score": 0.58984894,
          "severity": "HARM_SEVERITY_LOW",
          "severity_score": 0.22227517
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE",
          "probability_score": 0.2919425,
          "severity": "HARM_SEVERITY_LOW",
          "severity_score": 0.34954578
        }
      ]
    }
  ]

Thank You

dubby1994 commented 2 months ago

https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes

Non-configurable safety filters, which block child sexual abuse material (CSAM) and personally identifiable information (PII).