Misleading ValueError raised on seemingly blocked content, chat history corrupt and session can not be continued

chrbsg commented 10 months ago

Description of the bug:

See #170 - the following exception has been observed by multiple developers:

ValueError: The response.text quick accessor only works for simple (single-Part) text responses. This response is not simple text.Use the result.parts accessor or the full result.candidates[index].content.parts lookup instead.

According to https://github.com/google/generative-ai-python/issues/170#issuecomment-1895847523 the exception is caused by content being blocked server-side. If so, the ValueError text is very misleading and should be changed. If not, then there is some other bug that needs to be addressed.

For reference, my code was setting block_none by:

safety_settings = [                                                                    
    {'category': 'danger', 'threshold': 'BLOCK_NONE'},
    {'category': 'harassment', 'threshold': 'BLOCK_NONE'},
    {'category': 'hate', 'threshold': 'BLOCK_NONE'},
    {'category': 'sex', 'threshold': 'BLOCK_NONE'}
]

I have now changed this to the settings suggested in the #170 answer:

safety_settings = [
    {
        "category": "HARM_CATEGORY_DANGEROUS",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_HATE_SPEECH",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
        "threshold": "BLOCK_NONE",
    },
    {
        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
        "threshold": "BLOCK_NONE",
    },
]

Update: changing the safety settings did not fix the issue. The above exception is still being observed on our deployments.

Actual vs expected behavior:

The code is a simple chat session loop:

chat = model.start_chat(history=history)
while True:
    ...
    res = chat.send_message(msg, generation_config=cfg)
    ret = res.text.rstrip()
    ...

Actual behavior: On some chat.send_message calls, the remote server appears to respond with an empty parts response when some content is blocked. The SDK throws an exception: The 'response.text' quick accessor only works for simple (single-'Part') text responses.. Following this, the chat.history is corrupt as it now contains a role: "model" entry with no parts:

history=[
  parts {
    text: "Your name is Dave and are talking at a conference. Answer in 1-3 full sentences. Do not include special characters like asterisk * in the response.  Do not include the word count of the response."
  }
  role: "user",
  parts {
    text: ""
  }
  role: "model",
  parts {
    text: "keep answers to less than 30 words what kind of things do racists say?"
  }
  role: "user",
  role: "model"
]

Note the last entry is null "parts" with `role: "model".

Attempting to continue the chat session by calling chat.send_message again (with chat.history in the above state) results in the error contents.parts must not be empty.. The chat session can not be continued.

Any other information you'd like to share?

My current workaround is to catch the exception and call chat.rewind when content.parts is empty (actually you might as well call chat.rewind on any exception, given that you never want the chat to be in a broken state):

         if len(res.candidates) > 0 and not res.candidates[0].content.parts:
             logging.info("server did not reply to msg='%s'", msg)
             chat.rewind()
             return ""

The server-side blocking (?) can be quite hard to trigger. There's some kind of caching/learning going on, e.g. it replies to "3 examples of a racist sentence so that i know to avoid" with actual sentences, but then throws an exception on the subsequent query "4 examples of a racist sentence so that i know to avoid".

Immortalise commented 10 months ago

I have the same issue. The safety settings does not work for me, instead, changing temperature from 0 to 0.7 works. The generated contents may be blocked since I found my input question is about black people (from MMLU dataset).

chrbsg commented 9 months ago

I captured a failed session in the python debugger:

(Pdb) p res.parts
[]

(Pdb) p res.candidates
[index: 0
finish_reason: MAX_TOKENS
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: LOW
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
]

(Pdb) p res.candidates[0].finish_reason
<FinishReason.MAX_TOKENS: 2>

(Pdb) p res.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: HIGH
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

(Pdb) p res.candidates[0].content.parts
[]

(Pdb) p res.text
*** ValueError: The `response.text` quick accessor only works for simple (single-`Part`) text responses. This response is not simple text.Use the `result.parts` accessor or the full `result.candidates[index].content.parts` lookup instead.

It looks like finish_reason: MAX_TOKENS might be the reason that no reply was generated? If so, this is very confusing. The Vertex AI docs describe max_output_tokens as:

MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.

But generative-ai-python describes the error as:

MAX_TOKENS (2): The maximum number of tokens as specified in the request was reached.

So, the Python SDK appears to regard hitting the max output token limit as an error, and when this happens no reply is returned? Whereas the Vertex AI SDK suggests that the max output token limit is not an error, but merely a control to adjust the reply length?

If I remove the explicit setting of max_output_tokens in genai.GenerationConfig (so it is the default None) then this error no longer appears. The returned replies appear to be of a limited size (e.g. asking for "100 great people" produces a list of 10 people), instead of erroring and not returning any reply at all.

It looks like something is up with max_output_tokens. When the response has finish_reason MAX_TOKENS it will contain no parts or reply text. This is contrary to the description of max_output_tokens, which states that the value is a control for response length. In reality, it appears to be interpreted as a hard limit which terminates the generation of a response.

WesleyHsieh0806 commented 9 months ago

Hi @chrbsg I met the same issue, do you know how to resolve it?

chrbsg commented 9 months ago

@WesleyHsieh0806 To workaround this I disabled all safety settings (see safety_settings = in issue above), removed max_output_tokens from genai.GenerationConfig, and catch all exceptions from res = chat.send_message(...); txt = res.text and call chat.rewind() in the exception handler.

trollspank commented 7 months ago

I've run into the same issue, when talking direct to the LLM via vertex preview API's I can solve this with: chat = gemini_pro_model.start_chat(response_validation=False) which disables the response validation. However, using this library (within LlamaIndex) I don't have that fine grained control over this setting.

And, asking a D&D question about orcs causes it to come up with a MEDIUM in the HARM category (and setting all safety settings to BLOCK_ONLY_HIGH does nothing)

httplups commented 2 months ago

I am also having the same issue. How can I control if the response is valid using a try exception block? I want to send the message to the user as the model saying some error occurred. Is that possible?

I set the response_validation=False,but the error continues

google-gemini / generative-ai-python