DANCER (Part 1) - Githubissues

achen353 commented 2 years ago

DANCER (PART 1): Implement the ROUGE score calculation between "each sentence in the ground truth abstractive summary" and "each section". Map each sentence in the former to a section.

DUE: 11/20 Saturday 11:59pm

achen353 commented 2 years ago

This is not done yet. Need to tune the parameters (e.g. min/max number of sentences/tokens for to-be-summarize text, use Combination or Greedy labeling)

achen353 commented 2 years ago

我們 data 的預處理有幾個會影響 preprocessing result 的參數：

[greedy vs combination]：兩種 TransformerSum 內建，依據不同論文的 labeling function https://github.com/achen353/TransformerSum/blob/a13dce1d68a28fb6403462bd2818ccbfcfa9fac2/src/convert_to_extractive.py#L509
no_preprocess 是否設為 True：參數是 False 時，在處理 text 時不會因為過長或過短而被 discard，當參數是 True 時，會依照給的 argument 去做篩選 https://github.com/achen353/TransformerSum/blob/a13dce1d68a28fb6403462bd2818ccbfcfa9fac2/src/convert_to_extractive.py#L539

@andywang268 能幫我簡單試一下這些嗎（用 branch issue-17-test-params，結果放到 #23）：

預設：oracle_mode="greedy" 和 no_preprocess=False/None

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test

oracle_mode="greedy" 和 no_preprocess=True

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --no_preprocess

oracle_mode="combination" 和 no_preprocess=False/None

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination

oracle_mode="combination" 和 no_preprocess=True

python convert_to_extractive.py ../datasets/billsum_extractive --split_names test --add_target_to test --oracle_mode combination --no_preprocess

achen353 commented 2 years ago

Solved with #22 and #25

achen353 / TransformerSum

DANCER (Part 1) #19