Maluuba / nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.
http://arxiv.org/abs/1706.09799
Other
1.35k stars 224 forks source link

Problem with "the object oriented API for repeated calls in a script - multiple examples" #135

Open razaviah opened 2 years ago

razaviah commented 2 years ago

Hello,

I have a problem with the way that "object oriented API for repeated calls in a script - multiple examples" works. To my understandings, I have to have a list of hypotheses and a list of list of references. like hyps=['aaa', 'bbb'] and refs=[['aa', 'aaa', 'aaaa'], ['bb', 'bbb']], correct?

Well if that is the case I am getting an error indicating that the length of my hyp and refs are not equal which is not the case.

len references: 6724
len hypothesis: 6724
type references (should be list): <class 'list'>
type references[0] (should be list): <class 'list'>
type references[0][0] (should be string): <class 'str'>
type hypothesis (should be list): <class 'list'>
type hypothesis[0] (should be string): <class 'str'>

As you can see above, the lengths are the same and it should not be a problem

Here is my code which gets the error:

nlgeval = NLGEval() metrics_dict = nlgeval.compute_metrics(references, hypothesis)

And here is the error itself:

Traceback (most recent call last): File "/project/6007095/ahr91/references/scoring.py", line 46, in metrics_dict = nlgeval.computemetrics(references, hypothesis) File "/home/ahr91/.local/lib/python3.9/site-packages/nlgeval/__init_\.py", line 292, in compute_metrics assert len(refs) == len(hyps) AssertionError

If you need more information, please let me know.

juharris commented 2 years ago

Maybe one of the inner lists in refs doesn't have enough items? It's best if each inner list is the same length.

razaviah commented 2 years ago

I have even tried with 1 item in each inner list and it still gives the same error. Can you try it yourself and tell me if it works or not for you?

razaviah commented 2 years ago

@juharris Did you have the time to test it? I still cannot have it working properly

temporaer commented 2 years ago

Can you run this with pdb and print the two lists causing the discrepancy?