Open chrbsg opened 10 months ago
I have the same issue. The safety settings does not work for me, instead, changing temperature from 0 to 0.7 works. The generated contents may be blocked since I found my input question is about black people (from MMLU dataset).
I captured a failed session in the python debugger:
(Pdb) p res.parts
[]
(Pdb) p res.candidates
[index: 0
finish_reason: MAX_TOKENS
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: LOW
}
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
]
(Pdb) p res.candidates[0].finish_reason
<FinishReason.MAX_TOKENS: 2>
(Pdb) p res.prompt_feedback
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: HIGH
}
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
(Pdb) p res.candidates[0].content.parts
[]
(Pdb) p res.text
*** ValueError: The `response.text` quick accessor only works for simple (single-`Part`) text responses. This response is not simple text.Use the `result.parts` accessor or the full `result.candidates[index].content.parts` lookup instead.
It looks like finish_reason: MAX_TOKENS
might be the reason that no reply was generated? If so, this is very confusing. The Vertex AI docs describe max_output_tokens
as:
MAX_OUTPUT_TOKENS: Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.
But generative-ai-python describes the error as:
MAX_TOKENS (2): The maximum number of tokens as specified in the request was reached.
So, the Python SDK appears to regard hitting the max output token limit as an error, and when this happens no reply is returned? Whereas the Vertex AI SDK suggests that the max output token limit is not an error, but merely a control to adjust the reply length?
If I remove the explicit setting of max_output_tokens
in genai.GenerationConfig
(so it is the default None
) then this error no longer appears. The returned replies appear to be of a limited size (e.g. asking for "100 great people" produces a list of 10 people), instead of erroring and not returning any reply at all.
It looks like something is up with max_output_tokens
. When the response has finish_reason MAX_TOKENS it will contain no parts or reply text. This is contrary to the description of max_output_tokens, which states that the value is a control for response length. In reality, it appears to be interpreted as a hard limit which terminates the generation of a response.
Hi @chrbsg I met the same issue, do you know how to resolve it?
@WesleyHsieh0806 To workaround this I disabled all safety settings (see safety_settings =
in issue above), removed max_output_tokens
from genai.GenerationConfig
, and catch all exceptions from res = chat.send_message(...); txt = res.text
and call chat.rewind()
in the exception handler.
I've run into the same issue, when talking direct to the LLM via vertex preview API's I can solve this with: chat = gemini_pro_model.start_chat(response_validation=False)
which disables the response validation. However, using this library (within LlamaIndex) I don't have that fine grained control over this setting.
And, asking a D&D question about orcs causes it to come up with a MEDIUM in the HARM category (and setting all safety settings to BLOCK_ONLY_HIGH
does nothing)
I am also having the same issue. How can I control if the response is valid using a try exception block? I want to send the message to the user as the model saying some error occurred. Is that possible?
I set the response_validation=False,but the error continues
Description of the bug:
See #170 - the following exception has been observed by multiple developers:
According to https://github.com/google/generative-ai-python/issues/170#issuecomment-1895847523 the exception is caused by content being blocked server-side. If so, the ValueError text is very misleading and should be changed. If not, then there is some other bug that needs to be addressed.
For reference, my code was setting block_none by:
I have now changed this to the settings suggested in the #170 answer:
Update: changing the safety settings did not fix the issue. The above exception is still being observed on our deployments.
Actual vs expected behavior:
The code is a simple chat session loop:
Actual behavior: On some
chat.send_message
calls, the remote server appears to respond with an empty parts response when some content is blocked. The SDK throws an exception:The 'response.text' quick accessor only works for simple (single-'Part') text responses.
. Following this, thechat.history
is corrupt as it now contains arole: "model"
entry with noparts
:Note the last entry is null "parts" with `role: "model".
Attempting to continue the chat session by calling
chat.send_message
again (withchat.history
in the above state) results in the errorcontents.parts must not be empty.
. The chat session can not be continued.Any other information you'd like to share?
My current workaround is to catch the exception and call
chat.rewind
when content.parts is empty (actually you might as well callchat.rewind
on any exception, given that you never want the chat to be in a broken state):The server-side blocking (?) can be quite hard to trigger. There's some kind of caching/learning going on, e.g. it replies to "3 examples of a racist sentence so that i know to avoid" with actual sentences, but then throws an exception on the subsequent query "4 examples of a racist sentence so that i know to avoid".