EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
7k stars 1.87k forks source link

Invalid response for loglikelihood for GGUF model #2213

Open jaslatendresse opened 3 months ago

jaslatendresse commented 3 months ago

I have a llama-2-7b.gguf quantized in Q4_K_M that I run with llama.cpp server. The inference works fine.

This is the code (found from here, who experiences similar issue) I use to do a test eval (hence the limit param):

import json
from lm_eval.models.gguf import GGUFLM
from lm_eval import simple_evaluate

lm = GGUFLM(base_url="http://localhost:8080")  
results = simple_evaluate(model=lm,tasks=["french_bench_hellaswag"],device="mps",limit=10) 
filtered_results = results.copy()  
filtered_results = {key: value for key, value in results.items() if key != "samples"}  
json_filtered_results = json.dumps(filtered_results, indent=4)  
with open("results.json", "w") as json_file:
    json_file.write(json_filtered_results)

Every time I run this, I get the following error:

ERROR [gguf.py:96] Invalid response for loglikelihood.

Stack trace:

Traceback (most recent call last):
  File "/Users/jasminelatendresse/exp-os-assistant-redaction/francisation-llm/scripts/eval.py", line 6, in <module>
    results = simple_evaluate(model=lm,tasks=["french_bench_arc_challenge"],device="mps",limit=5) 
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/models/gguf.py", line 99, in loglikelihood
    assert False
           ^^^^^
AssertionError

I have tried with a different task, but I get the same error. Anyone else with this issue? Thank you.

I used the code suggested by here in the gguf.py file and it fixed it. However, I feel like this should still be considered a bug since it should run out of the box.

baberabb commented 2 months ago

looking into this here: https://github.com/EleutherAI/lm-evaluation-harness/issues/1472#issuecomment-2301827684_