confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
3.45k stars 272 forks source link

ValueError: Evaluation LLM outputted an invalid JSON. Please use a better evaluation model. #929

Open JingsenZhang opened 3 months ago

JingsenZhang commented 3 months ago

Problem: When I use the local Llama-7b model for faithfulness calculation, the following error occurs. Is it because this metric does not support models with smaller parameters? It only supports closed-source LLMs, such as gpt3.5?

ERROR:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.60s/it]

Traceback (most recent call last): File "/python3.9/site-packages/deepeval/metrics/faithfulness/faithfulness.py", line 241, in _a_generate_truths res: Truths = await self.model.a_generate(prompt, schema=Truths) TypeError: a_generate() got an unexpected keyword argument 'schema'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/deepeval/metrics/utils.py", line 79, in trimAndLoadJson return json.loads(jsonStr) File "/python3.9/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/python3.9/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 7 column 1 (char 163)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test.py", line 19, in faith_score = faithfulness_score(generated_list, evidence_list, model_name_or_path, device) File "faithfulness.py", line 51, in faithfulness_score metric.measure(test_case) File "/deepeval/metrics/faithfulness/faithfulness.py", line 59, in measure loop.run_until_complete( File "python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/deepeval/metrics/faithfulness/faithfulness.py", line 94, in a_measure self.truths, self.claims = await asyncio.gather( File "/deepeval/metrics/faithfulness/faithfulness.py", line 245, in _a_generate_truths data = trimAndLoadJson(res, self) File "/deepeval/metrics/utils.py", line 84, in trimAndLoadJson raise ValueError(error_str) ValueError: Evaluation LLM outputted an invalid JSON. Please use a better evaluation model.

tm-miko-planas commented 3 weeks ago

Hello, I've been experiencing the same issue, except I'm using an Azure OpenAI instance.

Have you had any luck in solving this?