Yale-LILY / SummEval

Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper
MIT License
366 stars 42 forks source link

How to match reference summary that was used by the expert to grade a given machine summary. #53

Open erastogi opened 4 months ago

erastogi commented 4 months ago

I am looking at this example:


**doc_id:** "dm-test-eeef09d26cf30c2124c0399b08eedc6321fe5d20",
**system_id:** "M11",
**source:** "Team-mates Neymar and Dani Alves proved their dedication to Barcelona by supporting the club ’ s basketball side . Neymar and Alves headed to watch El Clasico on Thursday night alongside the Brazilian 's sister Rafaella . Barca prevailed with a narrow 85-80 victory in the Euro League contest . Brazil star Neymar ( centre ) takes a selfie with friends and Barcelona team-mate Dani Alves ( right ) However Real Madrid remain top of their Euro League division over their bitter rivals , just by points difference . Neymar helped Brazil beat Chile 1-0 at the Emirates Stadium on Sunday in a feisty contest and had to withstand several brutal challenges from the South American rivals . Before the international break Luis Enrique 's Barcelona had won their El Clasico contest to move four points clear at the top of La Liga . Neymar and his Barcelona team-mates return to La Liga on Sunday , taking on Celta Vigo as they continue to compete for a treble of trophies . Neymar 's sister Rafaella ( left ) headed to watch El Clasico of basketball with the Barcelona forward Neymar 's sister Rafaella ( right ) attends a Euro League basketball match between Barcelona and Real Madrid Neymar is distracted at the basketball as his sister Rafaella ( centre ) watches on Neymar , Brazil 's captain , led his national side to a 1-0 win over Chile at the Emirates last weekend Barcelona team-mate Dani Alves ( front left ) joined Neymar and friends to watch some live basketball Gary Medel walks off having appeared to stamp on Neymar at the Emirates on Sunday",
**system_output:** "Real madrid beat chile 1-0 in el clasico on thursday . Dani alves and dani alves scored the goals in the euro league contest . Neymar and alves joined the club 's sister rafaella on sunday . Real madrid remain top of their euro league rivals at the emirates ."
**reference (from the datset):** "Neymar helped Brazil beat Chile 1-0 at the Emirates stadium last weekend . Barcelona won the El Clasico to go four points clear at the top of La Liga . Luis Enrique 's side take on Celta Vigo in La Liga on Sunday .",
Expert Relevance Score: [1,1,1]

If I look at the reference summary (referred to as "highlights" in the original dataset), then I feel the given system output (aka machine summary) is which very much relevant/similar to the reference summary. Hence, a score of 1 for relevance by the expert doesn't make sense.

In the paper, you have mentioned -

The data collection interface provided judges
with the source article and associated summaries
grouped in sets of 5. Each group of summaries
contained the reference summary associated with
the source article to establish a common point
of reference between groups. 

I was wondering which "reference" summary are you referring to in this section. Is it the one from the original dataset or one of the 10 summaries generated by the humans. If latter, is there a way to get which reference summary was used by the expert to score a given machine summary.

carlesoctav commented 6 days ago

Hi, have you figured this out?

vaahtio commented 3 days ago

I'm also interested in this

Alex-Fabbri commented 3 days ago

Hi! The quote from the paper means that the reference summary (the one from the original dataset) is also one of the 5 summaries being scored by the annotator. It is not being used to score the other summaries. Let us know if you have other questions!