achen353 / TransformerSum

BERT-based extractive summarizer for long legal document using a divide-and-conquer approach
GNU General Public License v3.0
3 stars 0 forks source link

Fix DANCER implementation #25

Closed achen353 closed 2 years ago

achen353 commented 2 years ago

Context

The implementation of #22 consists of two problem

  1. When creating section-level summaries, the ROUGE scores are calculated between each summary sentence and each section as a whole. However, the original DANCER implementation takes the ROUGE scores between every summary sentence and every single sentence in the original text (not section as a whole)
  2. With current DANCER implementation, for each section, there is often only 1 or even no sentence assigned label 1 as part of the extractive section summary.

Solution

  1. Fix Problem 1
  2. Upon fixing Problem 1, each summary sentence is assigned to the top k sections (default 3) instead of just top 1.
  3. Allowing an alternative text-summary alignment/matching: each section is matched with the entire document-level abstractive summary. This is enabled by setting by_section parameter to False for assign_section_level_summaries(); the default value is True)