EleutherAI / gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
https://www.eleuther.ai
MIT License
8.21k stars 945 forks source link

The temperature at 0.0001 (or other arbitrarily small float) is still too high #270

Open monsieurpooh opened 2 years ago

monsieurpooh commented 2 years ago

If setting the temperature to 0.00001 or similarly low float, the output is noticeably less chaotic than when temperature is a significantly larger numbers; however, the output is still very non-deterministic and often answers questions wrong even when the majority of the time it may have gotten it right. I suspect it would be better to have more freedom over the temperature range, where 0.00001 actually denotes extremely low temperature with almost no variation in output, for better question-answering capability

If anyone knows of a workaround to this please let me know

monsieurpooh commented 2 years ago

I am trying to fix this bug by delving into the code on my end, but I can't even figure out where the code lives. The first line is "from transformers import GPTNeoForCausalLM, GPT2Tokenizer". I can't find where "GPTNeoForCausalLM" is in transformers. When I do a text search on the whole library folder it turns up empty.

monsieurpooh commented 2 years ago

I found out how to browse the source code, but I am now confused about how dividing all the scores by a common value will change the ranking or the final result: https://github.com/huggingface/transformers/blob/87e6e4fe5c7e65cb69e70306f22de6daf16b6e14/src/transformers/generation_logits_process.py#L141

monsieurpooh commented 2 years ago

This is not really a bug. I found out I had to go way lower, lower than a typical "float" one would expect most programming languages to be able to handle. I specified 0.00000000000001 as the temperature and now the output is pretty consistent.

monsieurpooh commented 2 years ago

I would like to reopen this issue because in some situations with long prompts, even 1e-18 is not small enough to create a totally deterministic response, and at such a small number, the script has a chance of throwing an exception due to "probability tensor contains either inf, nan or element < 0"

monsieurpooh commented 2 years ago

The workaround is disable sampling