Closed karansaxena closed 5 months ago
Hi, the inputs of string_match_all()
are all sample predictions [str, str, ...]
and reference answers [[str, str, ...], [str, str, ...], ...]
. Here is the line we use this metric function.
Sorry, I did not follow. The prediction is the text output from the LLM, right? (I don't think that is being postprocessed into a list]
The ground-truth is a the list of correct terms, as shown in the example above, right?
Basically, what I am missing is the step where/how we are converting the LLM text output to [str, str, ...]
Thanks for your help.
Edit - I am talking about ONE datapoint. I am trying to run this for a single test case and a single task.
Yeap, I know you are talking about one datapoint. However, the design for string_match_all()
is used to evaluate multiple data points which preds=[str, str, ...]
and refs=[[str, str, ...], [str, str, ...]]
. So in your case, your input should be a list like the following:
preds = ['1. arthur 2. kilt 3. fire 4. meter 5. appliance 6. behalf 7. forest 8. activity 9. authenticity 10. ferret']
refs = [['appliance', 'meter', 'forest', 'ferret', 'kilt', 'behalf', 'fire', 'activity', 'arthur', 'authenticity']]
Ah, got it. I adapted the code for my use but missed the point where the inputs are single-vs-batched. Resolved.
Thanks for the help!
Hi Authors,
I am trying hard to understand the post-processing format before evaluation step. For a particular case I am looking at, the prediction (LLM output) is a string, for eg:
'1. arthur 2. kilt 3. fire 4. meter 5. appliance 6. behalf 7. forest 8. activity 9. authenticity 10. ferret'
The corresponding ground-truth is['appliance', 'meter', 'forest', 'ferret', 'kilt', 'behalf', 'fire', 'activity', 'arthur', 'authenticity']
When I do the string_match_all function, the output is 0.31. This does not look right.
Specifically https://github.com/hsiehjackson/RULER/blob/main/scripts/eval/synthetic/constants.py#L29 this line is doing a zip between a string and a list, which would be a character-wise zip.
Where am I going wrong?