BangLiu / ArticlePairMatching

The code of ACL 2019 paper: Matching Article Pairs with Graphical Decomposition and Convolutions
Other
235 stars 60 forks source link

关于feature_extractor.py的__main__中多次调用dataset2featurefile方法的问题 #19

Closed jc-ryan closed 4 years ago

jc-ryan commented 4 years ago

作者您好,在feature_extractor.py中有如下语句 if name == "main":

debug with a few lines

dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_event_doc_pair.cd.debug.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1=None, col_title2=None, use_cd=True,
    draw_fig=True, parallel=False, extract_range=range(2), print_fig=True)

# process data
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_event_doc_pair.cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=True,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_story_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_story_doc_pair.cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=True,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=False,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_story_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_story_doc_pair.no_cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=False,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)`

`

其中多次调用了dataset2featurefile这一方法,除第一个参数extract_range设置为range(2),其余后后面几次都是一样的;请问这样做是否是必要的,实际运行时只保留其中一次调用可以吗,如果是,保留extract_range=range(2)的,还是extract_range=None的呢?谢谢您!

BangLiu commented 4 years ago

Hi, the reason why I process the dataset for 4 times is:

  1. we test on two datasets: same event dataset, and same story dataset;
  2. we want to test the influence of community detection, therefore, we set "use_cd" as true or false for both datasets. In this way, we have 4 different parameter combinations.
jc-ryan commented 4 years ago

Hi, the reason why I process the dataset for 4 times is:

  1. we test on two datasets: same event dataset, and same story dataset;
  2. we want to test the influence of community detection, therefore, we set "use_cd" as true or false for both datasets. In this way, we have 4 different parameter combinations.

Sorry, I mistakenly thought they were the same, thanks.