131250208 / TPlinker-joint-extraction

438 stars 94 forks source link

请问tplinker在WebNLG上大概多少个epoch可以收敛呢? #46

Closed NONOThingC closed 3 years ago

NONOThingC commented 3 years ago

您好,我使用您的tplinker(非plus),在WebNLG上尝试进行实验,但是目前跑了100多个epochs还是head, tail rel acc为0,我观察到您在其它问题中回复到batch_size是6,epoch为200,请问这个数据集大概多少个epoch head_rel, tail_rel会有明显的提升呢? 我除了batch_size似乎其他参数都是与您提供的相同的,按理说batch_size差别不至于这么大吧?参数如下。

24 说的是NYT的问题,但是我在NYT跑的结果还挺好的,就是WebNLG不太对。所以另起了一个issue。

common = {
    "exp_name": "webnlg",
    "rel2id": "rel2id.json",
    "device_num": 0,
#     "encoder": "BiLSTM",
    "encoder": "BERT", 
    "hyper_parameters": {
        "shaking_type": "cat", # cat, cat_plus, cln, cln_plus; Experiments show that cat/cat_plus work better with BiLSTM, while cln/cln_plus work better with BERT. The results in the paper are produced by "cat". So, if you want to reproduce the results, "cat" is enough, no matter for BERT or BiLSTM.
        "inner_enc_type": "lstm", # valid only if cat_plus or cln_plus is set. It is the way how to encode inner tokens between each token pairs. If you only want to reproduce the results, just leave it alone.
        "dist_emb_size": -1, # -1: do not use distance embedding; other number: need to be larger than the max_seq_len of the inputs. set -1 if you only want to reproduce the results in the paper.
        "ent_add_dist": False, # set true if you want add distance embeddings for each token pairs. (for entity decoder)
        "rel_add_dist": False, # the same as above (for relation decoder)
        "match_pattern": "whole_text", # only_head_text (nyt_star, webnlg_star), whole_text (nyt, webnlg), only_head_index, whole_span
    },
}
common["run_name"] = "{}+{}+{}".format("TP1", common["hyper_parameters"]["shaking_type"], common["encoder"]) + ""

run_id = ''.join(random.sample(string.ascii_letters + string.digits, 8))
train_config = {
    "train_data": "train_data.json",
    "valid_data": "valid_data.json",
    "rel2id": "rel2id.json",
    "logger": "wandb", # if wandb, comment the following four lines

#     # if logger is set as default, uncomment the following four lines
#     "logger": "default", 
#     "run_id": run_id,
#     "log_path": "./default_log_dir/default.log",
#     "path_to_save_model": "./default_log_dir/{}".format(run_id),

    # only save the model state dict if F1 score surpasses <f1_2_save>
    "f1_2_save": 0, 
    # whether train_config from scratch
    "fr_scratch": True,
    # write down notes here if you want, it will be logged 
    "note": "start from scratch",
    # if not fr scratch, set a model_state_dict
    "model_state_dict_path": "",
    "hyper_parameters": {
        "batch_size": 24,
        "epochs": 200,
        "seed": 2333,
        "log_interval": 10,
        "max_seq_len": 100,
        "sliding_len": 20,
        "loss_weight_recover_steps": 6000, # to speed up the training process, the loss of EH-to-ET sequence is set higher than other sequences at the beginning, but it will recover in <loss_weight_recover_steps> steps.
        "scheduler": "CAWR", # Step
    },
}

eval_config = {
    "model_state_dict_dir": "./default_log_dir", # if use wandb, set "./wandb", or set "./default_log_dir" if you use default logger
    "run_ids": ["DGKhEFlH", ],
    "last_k_model": 1,
    "test_data": "*test*.json", # "*test*.json"

    # where to save results
    "save_res": False,
    "save_res_dir": "../results",

    # score: set true only if test set is annotated with ground truth
    "score": True,

    "hyper_parameters": {
        "batch_size": 32,
        "force_split": False,
        "max_test_seq_len": 512,
        "sliding_len": 50,
    },
}
yc0815024 commented 2 years ago

请问您是咋解决的,我也遇到了相同的问题

131250208 commented 2 years ago

WebNLG数据量非常小,如果调大batch size,除了大batch本身收敛较慢以外,每一轮的训练步数会相应减少。batch_size = 6时应该是在30轮左右收敛,batch size调大4倍每一轮的训练步数为原来的1/4,如果收敛需要的步数是一样的,收敛轮次至少应该到120轮左右。也可能是其他参数需要调整,具体原因还要具体分析