danieldeutsch / sacrerouge

SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Apache License 2.0
134 stars 11 forks source link

PyrEval #89

Closed ZhangShiyue closed 3 years ago

ZhangShiyue commented 3 years ago

Hi, thanks for this awesome toolkit!

I encountered an error with PyrEval. I tried to run:

summary = "Dundee United Striker Nadir CIFTCI celebrated a goal by blowing a kiss at opposition goalkeeper Scott Bain . The 23 - year - old celebrated by trying to Rile Dundee No 1 Bain , but his actions came back to haunt him as the Dark Blues earned all three points thanks to further goals from Jake McPake and Paul Heffernan . Dundee's first win in a derby for more than 10 years ."

ref = "nadir ciftci celebrated by blowing a kiss at rival goalkeeper scott bain . however , ciftci was left blushing as rivals earned impressive victory . win gave hosts dundee their first derby win in more than a decade . goals from greg stewart , jake mcpake and paul heffernen secured win ."

pyreval.score(summary, [ref])

Here is the verbose log:

../Preprocess/peer_summaries [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos [main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.5 sec]. [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... [main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 6.997 (s) [main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [8.2 sec].

Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/.gitkeep ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/.gitkeep.xml Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/.gitkeep ... done [0.1 sec]. Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/0 ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/0.xml Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/0 ... done [0.8 sec]. Processing file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/1 ... writing to /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/peer_summaries/1.xml Annotating file /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Raw/peers/split/1 ... done [0.2 sec].

Annotation pipeline timing information: TokenizerAnnotator: 0.1 sec. WordsToSentencesAnnotator: 0.0 sec. POSTaggerAnnotator: 0.1 sec. ParserAnnotator: 0.8 sec. DependencyParseAnnotator: 0.1 sec. TOTAL: 1.0 sec. for 120 tokens at 114.9 tokens/sec. Pipeline setup: 9.4 sec. Total time for StanfordCoreNLP pipeline: 10.6 sec. DECOMPOSING SENTENCES FROM SUMMARY ../Preprocess/peer_summaries/0.xml VECTORIZING SEGMENTS FROM SUMMARY ../Preprocess/peer_summaries/0.xml /ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Preprocess/ormf/ormf.py:112: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1. segment.setVector(np.linalg.lstsq(num, den)[0]) DECOMPOSING SENTENCES FROM SUMMARY ../Preprocess/peer_summaries/1.xml VECTORIZING SEGMENTS FROM SUMMARY ../Preprocess/peer_summaries/1.xml Time: 1.70384001732

Welcome to the PyrEval Launcher.

NOTES:

0: Automatic mode (not recommended)

1: Preprocess - Split sentences 2: Run Stanford Core NLP Tools 3: Preprocess - Main 4: Build pyramids 5: Score pyramids

c: Clean directories i: Change python interpreter

To quit, type nothing and press return.

['../Preprocess/wise_crowd_summaries/0.xml', '../Preprocess/wise_crowd_summaries/1'] 4 4 4 Traceback (most recent call last): File "/ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Pyramid/pyramid.py", line 94, in BigSet2 = pairwise(segmentpool, N, threshold) File "/ssd-playpen/home/shiyue/cache/sacrerouge/metrics/PyrEval/Pyramid/lib_pyramid.py", line 341, in pairwise Q3 = np.percentile(np.asarray(scores), threshold) File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3707, in percentile a, q, axis, out, overwrite_input, interpolation, keepdims) File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3826, in _quantile_unchecked interpolation=interpolation) File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3405, in _ureduce r = func(a, kwargs) File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3941, in _quantile_ureduce_func x1 = take(ap, indices_below, axis=axis) weights_below File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 189, in take return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) File "/ssd-playpen/home/shiyue/anaconda2/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc return getattr(obj, method)(args, kwds) IndexError: cannot do a non-empty take from an empty axes.

I did run pytest sacrerouge/tests/metrics/pyreval_test.py, which looks normal to me: E AssertionError: Instance 1 not equal. Expected {'pyreval': {'raw': 16, 'quality': 0.47058823529411764, 'coverage': 0.3404255319148936, 'comprehensive': 0.4055068836045056}}, actual {'pyreval': {'raw': 17, 'quality': 0.5, 'coverage': 0.3617021276595745, 'comprehensive': 0.4308510638297872}}

sacrerouge/common/testing/metric_test_cases.py:42: AssertionError =============================== short test summary info ================================ FAILED sacrerouge/tests/metrics/pyreval_test.py::TestPyrEval::test_pyreval - Assertio... ======================= 1 failed, 3 passed in 377.74s (0:06:17) ========================

So, I don't know why it always has this error when evaluating my examples. I also tried some other [summary, ref] pairs, however, all throw this error. Do you have any idea of why this happens? Any hint will be helpful! Thank you so much!

danieldeutsch commented 3 years ago

Hi,

I think this is a bug in the original PyrEval code in which the code crashes if there is only 1 reference summary. I've had this problem before too.

When the pyramid is constructed, there is a pairwise similarity set created here https://github.com/serenayj/PyrEval/blob/b44bb991cf82c30e473b02534e1dbc2687747091/Pyramid/pyramid.py#L84-L94

That calls the pairwise function, which gets all combinations of the segments across reference summaries via the combinations function https://github.com/serenayj/PyrEval/blob/b44bb991cf82c30e473b02534e1dbc2687747091/Pyramid/lib_pyramid.py#L306-L312 summs is length 1 because there's only 1 reference, so getting all the pairwise combinations results in an empty summ_pairs, causing scores to remain empty.

I think if you do need to run it with 1 reference summary, it's probably best to open an issue here.

It's also worth knowing that if you do have multiple references, the score depends on the order of the references, which ends up being platform dependent. See here and my note about it here.

ZhangShiyue commented 3 years ago

Hi, thanks for your prompt reply!

I tried to repeat the reference: pyreval.score(summary, [ref, ref]) There is no error. Do you think this is a valid solution or not?

danieldeutsch commented 3 years ago

I am not sure, sorry. You would have to ask the authors of PyrEval. I only wrote a wrapper around their implementation

ZhangShiyue commented 3 years ago

No worries. Sure, thank you so much!

danieldeutsch commented 3 years ago

If it gets fixed upstream, I am happy to merge the changes here. Thanks!