bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
781 stars 208 forks source link

Fix Max New Tokens in HF's Generation Config #257

Open mostafaelhoushi opened 2 months ago

mostafaelhoushi commented 2 months ago

HuggingFace's max_length configuration corresponds to the total length of the prompt and the generated output, while max_new_tokens corresponds to the length of generated output only.

Using args.max_length_generation to set max_new_tokens fixed runtime errors for me. Using args.max_length_generation to set max_length lead to runtime errors because the total length of prompt+generation would exceed the intended value.

kbmlcoding commented 2 months ago

Thanks for fixing it > This is the message i am seeing as well in the logs when ran humaneval against llama2-7b-chat-hf model:

bigcode-evaluation-harness/bigcode_eval/utils.py:361: UserWarning: An error with the following message was thrown: Input length of input_ids is 1000, but max_length is set to 1000. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.. Returning the input as the generation, for higher scores consider using a larger max_length 2024-07-23 11:50:32 EDT code_eval line: 74: [INFO] warnings.warn(f"An error with the following message was thrown: {e}. Returning the input as the generation, for higher scores consider using a larger max_length")

Adding more details for clarity per official api doc from HF https://huggingface.co/docs/transformers/en/main_classes/text_generation

max_length (int, optional, defaults to 20) — The maximum length the generated tokens can have. Corresponds to the length of the input prompt + max_new_tokens. Its effect is overridden by max_new_tokens, if also set.

max_new_tokens (int, optional) — The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.

mostafaelhoushi commented 2 weeks ago

Thanks @kbmlcoding for approving. Still unable to merge the PR. Do we need another approval?

Cc @loubnabnl