Closed dfberry closed 8 months ago
Please change line 44 in generate.py to:
qa.append({"question": question, "truth": answer + citation})
Merged fix for this, thanks for the report!
I'm still having problems with this. If I print out the value, it says None. Should the fn take that into account?
def passes_threshold(rating):
if rating is None:
return False
return int(rating) >=4
Okay, it didn't calculate a metric for some reason. Can you share the logs?
I can take it into account, but it usually indicates an error somewhere earlier in the script, so I'd want to put a helpful message for debugging.
When you get this running again, please share full output from your session, I can try to pinpoint why you'd have a None value
The full output is at the top of the issue. Can you see if there is something there that can help pin down the issue or give me the next step? @pamelafox
Ah okay so that looks like the error from before, when generate created example ground truth data which contained "answer" for the column name. Can you check qa-2.jsonl and make sure the column is named "truth"?
Ok, its working now with that change and all the ratings are returning. I'll close this.
I am getting the same issue, some times the rating is None. I did not understand how you fixed it @dfberry.
I've added a print to make sure the keys are correct:
Note that this occurs eventually, but the bigger the dataset, the more this happens. @pamelafox, it seems to be an issue in the metrics computation (by GPT-4). I hope that this is not related to the use of a language different than English :(
return int(rating) >= 4
^^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
I wondering if replacing None
with the min value (1) would be a valid solution.
During my tests, this seems to happen only with the GPT Relevance metric and in different QA pairs (sometimes it works, sometimes not, that is, I get None)
For now I've just change the code to:
def passes_threshold(rating):
if rating is None:
return False
return int(rating) >= 4
Did you notice anything in the logs about rate-limited exceeded? That can happen with your GPT-4 instance. I should probably make the change you have there, but add a warning about missing data, and count up how much missing data there is.
Did you notice anything in the logs about rate-limited exceeded? That can happen with your GPT-4 instance.
No, I did not.
I should probably make the change you have there, but add a warning about missing data, and count up how much missing data there is. That should work.
Another option would be to add a retry for the QA pairs that get the None as the metric. Example: Finish the current process, and check if any metric gets a None, the ones you get, try again (only that ones) until you remove all Nones
console looked like:
error looked like
eval_results.jsonl is