Agenta-AI / agenta

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
http://www.agenta.ai
MIT License
1.25k stars 184 forks source link

[AGE-474] JSON diff evaluator not working expectedly #1942

Closed mmabrouk closed 2 weeks ago

mmabrouk commented 3 months ago

I have configured a JSON diff evaluator using the default values:

# Post submitted by FE at creation of evalutor
{
    "id": "0190fb1e-8d2e-716a-8e74-32bd5e61c570",
    "name": "new2",
    "evaluator_key": "auto_json_diff",
    "settings_values": {
        "predict_keys": false,
        "correct_answer_key": "correct_answer",
        "compare_schema_only": false,
        "case_insensitive_keys": false
    },
    "created_at": "2024-07-28 20:53:21.838663+00:00",
    "updated_at": "2024-07-28 20:53:21.838668+00:00"
}

My ground truth looks like:

{
  "CCI_edits": ["CCI 1", "CCI 3"],
  "E_M": "99214",
  "HCC": ["HCC19", "HCC59"],
  "ICD_10": ["I10", "E11.9", "Z87.891"],
  "CPT_HCPCS": ["99213", "96372", "85610", "84443"]
}

And the prediction

{
  "CCI_edits": [],
  "E_M": "99214",
  "HCC": ["HCC3", "HCC19"],
  "ICD_10": ["J45.909", "E11.9", "I10", "Z79.84"],
  "CPT_HCPCS": ["99214", "94640", "99213", "85018"]
}

Yet, the computation of the evaluator fails due to division by 0 (no keys!)

 Traceback (most recent call last): File "/app/agenta_backend/services/evaluators_service.py", line 593, in auto_json_diff average_score = compare_jsons( File "/app/agenta_backend/services/evaluators_service.py", line 579, in compare_jsons average_score = cumulated_score / no_of_keys ZeroDivisionError: float division by zero

From SyncLinear.com | AGE-474

AnanyaP-WDW commented 3 months ago

Hi @mmabrouk. I'd like to work on this. Can you share more details? I'm trying to replicate the issue here with settings_values = {}. I don't understand why there would be a ZeroDivisionError when there exists a ground truth json with valid keys?

mmabrouk commented 2 weeks ago

Fixed