Input length of input_ids is 1211, but `max_length` is set to 1000. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. Benchmark run failed

EQ-bench / EQ-Bench

A benchmark for emotional intelligence in large language models

MIT License

195 stars 17 forks source link

Input length of input_ids is 1211, but `max_length` is set to 1000. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. Benchmark run failed #14

Closed Abdullah-kwl closed 8 months ago

Abdullah-kwl commented 8 months ago

in config.cfg file i use the parameters: [Benchmarks to run] myrun1, Llama-v2, meta-llama/Llama-2-7b-chat-hf, , 8bit, 1, transformers, , ,

it is showing the error: Failed to parse scores 99% 170/171 [01:18<00:00, 2.17it/s] Input length of input_ids is 1211, but max_length is set to 1000. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens. Benchmark run failed.

i think it is due to large input tokens of questions 170 or 171 how can i set max_length > 1200 please update the code or gave the solution to solve this error

sam-paech commented 8 months ago

Hi, You can set the max completion length in lib/run_bench.py.

Alternatively EQ-Bench was just updated to version 2.1 (as of a few hours ago), so you may wish to try pulling the latest, and seeing if that helps.

Abdullah-kwl commented 8 months ago

I have cloned the updated repo it solved the issue

The outcome is printed:

99% 170/171 [22:09<00:07, 7.64s/it]Both `max_new_tokens` (=60) and `max_length`(=1196) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation) 100% 171/171 [22:16<00:00, 7.81s/it] ----Benchmark Complete---- 2024-02-27 06:37:14 Time taken: 25.3 mins Prompt Format: Llama-v2 Model: meta-llama/Llama-2-7b-chat-hf Score (v2): 36.97 Parseable: 170.0

Batch completed Time taken: 25.3 mins