Open us2547 opened 2 days ago
@dragonstyle Could you take a look at this?
It seems that the problem is not linked only to the scorer that outputs string. After commenting out the problem was reproduced (with different scorer failing validation). The run on a same dataset but with the limit number of samples works. I suspect the problem somehow related to the specific sample. Is there a way to debug the EvalLog to identify the root cause? The problem was reproduced with latest inspect version.
I believe the root cause of the problem is when scorer output "value" as null. After changing null values to zero, the log parser works.
"metrics": {
"accuracy": {
"name": "accuracy",
"value": null,
"options": {}
}
}
or from "samples":
{
"value": null,
"answer": "Answer .....",
"explanation": "Explanation .....",
"metadata": {
"faithfulness_score": null
},
"sample_id": "id-2024-5"
},
Is the issue that the value for some cases is being returned as NaN? I could see that we would serialize that to null and that our type validation wouldn't allow that to pass since null isn't a valid value for a score.
The value returned was numpy.nan
which was converted to string value null
by inspect-ai
. The Score
class allows to set value
to numpy.nan
and there are no warnings or errors when doing so, only when reading the log with inspect utility.
When using
read_eval_log
to read the EvalLog (json) receiving error below. The problematic log is not consistent, it happens only raraly. Also, when usingEvalLog
object as an output from the run (not reading from file), the iterations over the object work fine.One observation, it seems that the problem is related to the scorer that outputs "string" and doesn't have "metric". Seems that the problem is related to the issue #775 .
Unfortunately problematic file is very large. Example error trace. The place "2" is used by scorer that outputs "string".