google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.5k stars 297 forks source link

Variability in Responses with top_k=1 Parameter in Gemini Pro Model #192

Closed storybite closed 6 months ago

storybite commented 8 months ago

Description of the bug:

Greetings,

While experimenting with the GenerationConfig parameters in the Gemini Pro model, I've noticed an unexpected variability in the outputs generated with the top_k=1 setting, which contrasts with the near-consistent responses observed with top_p=0.

Detailed Explanation: Testing revealed that while top_p=0 leads to near-consistent outputs for the same input—which is expected due to its nature of narrowing down the generation to the most probable outcomes—the top_k=1 setting does not exhibit the same level of consistency. This observation is intriguing given that top_k=1 theoretically limits the response generation to the top k probable outcomes, which should similarly result in near-consistent outputs for identical requests.

Below is the Python code snippet illustrating the tests conducted and highlighting the difference in response consistency:

import google.generativeai as genai

model = genai.GenerativeModel(model_name='gemini-pro')
user_message = "Write a one-sentence poem"

# Testing for near-consistent response with top_p=0
print("\ntop_p=0:")
generation_config = genai.GenerationConfig(top_p=0)
for _ in range(3):    
    response = model.generate_content(user_message, generation_config=generation_config)
    print(f'{"_"*20}\n{response.text}')

# Testing for variability with top_k=1
print("\ntop_k=1:")
generation_config = genai.GenerationConfig(top_k=1)
for _ in range(3):
    response = model.generate_content(user_message, generation_config=generation_config)
    print(f'{"_"*20}\n{response.text}')    
output:

top_p=0:
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.

top_k=1:
____________________
In a world of colors, a heart beats, a story unfolds.
____________________
In cosmic expanse, a flicker of light, a tale untold.
____________________
In twilight's embrace, dreams whisper of distant stars.

Actual vs expected behavior:

Expected Behavior: It is anticipated that top_k=1 would result in near-consistent responses for the same input, similar to the behavior observed with top_p=0.

Actual Behavior: The top_k=1 parameter exhibits significant variability in responses for identical inputs, contrary to the near-consistency expected.

Any other information you'd like to share?

No response

ymodak commented 8 months ago

Hi @storybite , By default the model values are temperature=0.9, top_p=1.0, top_k=1. So you may try setting with -

generation_config = genai.GenerationConfig(top_k=1, top_p=0)
for _ in range(3):
    response = model.generate_content(user_message, generation_config=generation_config)
    print(f'{"_"*20}\n{response.text}')    

Output -

top_p=0, top_k=1:
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.

Also I think temperature attribute would me more accurate to control the variation of the output. You may try setting temperature=0 to get constant output and tweak it higher to get more varied responses.

storybite commented 8 months ago

Hi @storybite , By default the model values are temperature=0.9, top_p=1.0, top_k=1. So you may try setting with -

generation_config = genai.GenerationConfig(top_k=1, top_p=0)
for _ in range(3):
    response = model.generate_content(user_message, generation_config=generation_config)
    print(f'{"_"*20}\n{response.text}')    

Output -

top_p=0, top_k=1:
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.
____________________
In the vast expanse, a star whispers its tale.

Also I think temperature attribute would me more accurate to control the variation of the output. You may try setting temperature=0 to get constant output and tweak it higher to get more varied responses.

Hi @ymodak.

I'm aware that setting temperature=0 or top_p=0 can yield the desired outcome. My point was about the introduction of the top_k parameter by Google Gemini API; it should operate meaningfully if introduced, but it seems not to function as expected. If it doesn't work properly, it might be better not to expose the top_k parameter at all, similar to the GPT series.

It is known that top_k extracts the k words (tokens) with the highest probability. So I know that if I set top_k=1, it should behave similarly to if I set top_p=0. If top_k doesn't work as currently theorized, I might reconsider its usage in the future.

cyr0930 commented 8 months ago

same here

github-actions[bot] commented 7 months ago

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

storybite commented 7 months ago

I'm still interested in resolving this issue.

cyr0930 commented 7 months ago

Even setting (temperature=0.0, top_p=0.0, top_k=1) sometimes generates different outputs

github-actions[bot] commented 6 months ago

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 28 days. Please post a new issue if you need further assistance. Thanks!

rmqrmqrmq commented 5 months ago

Anybody know how to solve this issue? It is a big problem when doing some code processing task.