achen353 / TransformerSum

BERT-based extractive summarizer for long legal document using a divide-and-conquer approach
GNU General Public License v3.0
3 stars 0 forks source link

DANCER (Part 1) #19

Closed achen353 closed 2 years ago

achen353 commented 2 years ago

DANCER (PART 1): Implement the ROUGE score calculation between "each sentence in the ground truth abstractive summary" and "each section". Map each sentence in the former to a section.

DUE: 11/20 Saturday 11:59pm

achen353 commented 2 years ago

This is not done yet. Need to tune the parameters (e.g. min/max number of sentences/tokens for to-be-summarize text, use Combination or Greedy labeling)

achen353 commented 2 years ago

我們 data 的預處理有幾個會影響 preprocessing result 的參數:

@andywang268 能幫我簡單試一下這些嗎(用 branch issue-17-test-params,結果放到 #23):

  1. 預設:oracle_mode="greedy"no_preprocess=False/None
    python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test
  2. oracle_mode="greedy"no_preprocess=True
    python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --no_preprocess
  3. oracle_mode="combination"no_preprocess=False/None
    python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination
  4. oracle_mode="combination"no_preprocess=True
    python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination --no_preprocess
achen353 commented 2 years ago

Solved with #22 and #25