abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.65k stars 919 forks source link

Gemma-2 with Temperature=0 gives different output #1606

Open junsiknss opened 1 month ago

junsiknss commented 1 month ago

Version: llama-cpp-python==0.2.82 Model: "bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q8_0.gguf"

When I load the gemma2 model with temperature=0, and run a simple prompt, it always gives the same output.

However, when I enter long, complex prompts, the output changes. Confusingly, if I repeat the same long prompt multiple times, the first and second results are different, but soon the output becomes same with iteration.

When I loaded the same model directly using the llama-cli in llama.cpp, I didn't have this problem. Also, I didn't have this problem with llama-3. What could be the problem?

My test code is below

from llama_cpp import Llama

llm = Llama(model_path="~/.cache/huggingface//hub/models--bartowski--gemma-2-9b-it-GGUF/snapshots/d731033f3dc4018261fd39896e50984d398b4ac5/gemma-2-9b-it-Q8_0.gguf", chat_format="gemma", n_batch=512, n_ctx=8192, n_gpu_layers=-1)

output = llm("""You are a renowned historian specializing in ancient civilizations, and you have been invited to give a lecture on the political, social, and technological advancements of the Sumerian civilization, particularly during the Ur III period (circa 2100-2000 BCE). Your audience includes scholars, students, and history enthusiasts. Your lecture should cover the following key areas in detail:
        1.      Political Structure: Describe the centralized administration, the role of the king, and the bureaucratic system.
        2.      Economic Systems: Explain the agricultural innovations, trade networks, and economic policies.
        3.      Social Hierarchy: Discuss the class structure, roles of different social groups, and gender dynamics.
        4.      Technological and Scientific Contributions: Highlight key inventions, advances in mathematics and astronomy, and architectural achievements.
        5.      Cultural and Religious Practices: Explore their religious beliefs, major deities, and cultural contributions like literature and art.
Ensure your lecture is comprehensive, well-researched, and engaging, providing clear examples and drawing connections to how these advancements influenced later civilizations.""", # Prompt
      max_tokens=8192, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      echo=False, # Echo the prompt back in the output
      temperature=0.0
) # Generate a completion, can also call create_completion
print(output)

#When I run same Prompt and get output2, it is different from above output
yamikumo-DSD commented 1 month ago

I'm not confident on the reason of the behavior, but in reality, tempelature=0.0 doesn't means $temperature=0.0$. The true meaning is $temperature=\epsilon$ , where $\epsilon$ is very small number. This is to avoid division by zero. And hence, the overall distribution of next token is not true $\delta(x)$.