I have a llama-2-7b.gguf quantized in Q4_K_M that I run with llama.cpp server. The inference works fine.
This is the code (found from here, who experiences similar issue) I use to do a test eval (hence the limit param):
import json
from lm_eval.models.gguf import GGUFLM
from lm_eval import simple_evaluate
lm = GGUFLM(base_url="http://localhost:8080")
results = simple_evaluate(model=lm,tasks=["french_bench_hellaswag"],device="mps",limit=10)
filtered_results = results.copy()
filtered_results = {key: value for key, value in results.items() if key != "samples"}
json_filtered_results = json.dumps(filtered_results, indent=4)
with open("results.json", "w") as json_file:
json_file.write(json_filtered_results)
Every time I run this, I get the following error:
ERROR [gguf.py:96] Invalid response for loglikelihood.
Stack trace:
Traceback (most recent call last):
File "/Users/jasminelatendresse/exp-os-assistant-redaction/francisation-llm/scripts/eval.py", line 6, in <module>
results = simple_evaluate(model=lm,tasks=["french_bench_arc_challenge"],device="mps",limit=5)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jasminelatendresse/lm-evaluation-harness/lm_eval/models/gguf.py", line 99, in loglikelihood
assert False
^^^^^
AssertionError
I have tried with a different task, but I get the same error. Anyone else with this issue? Thank you.
I used the code suggested by here in the gguf.py file and it fixed it. However, I feel like this should still be considered a bug since it should run out of the box.
I have a llama-2-7b.gguf quantized in Q4_K_M that I run with llama.cpp server. The inference works fine.
This is the code (found from here, who experiences similar issue) I use to do a test eval (hence the
limit
param):Every time I run this, I get the following error:
ERROR [gguf.py:96] Invalid response for loglikelihood.
Stack trace:
I have tried with a different task, but I get the same error. Anyone else with this issue? Thank you.
I used the code suggested by here in the gguf.py file and it fixed it. However, I feel like this should still be considered a bug since it should run out of the box.