Closed OStars closed 1 year ago
Thanks for your interest in our work.
Please notice that the scores reported in the training script are just for selecting the best checkpoint.
To get the real scores, after training the model, you have to use either eval_end2endEE.py
or eval_pipelineEE.py
for the evaluation. In both scripts, there is a cal_scores
function that can calculate trigger_id
and trigger_cls
. Please refer to https://github.com/PlusLabNLP/DEGREE/blob/666dd8907717d1cb0ea3692867cd7404e892ce54/degree/eval_pipelineEE.py#L121
Thanks a lot! But I am still confused about why we need to run other scripts(eval_end2endEE.py
or eval_pipelineEE.py
). Is there any difference between real scores and scores reported in the training script?
Thanks a lot! But I am still confused about why we need to run other scripts(
eval_end2endEE.py
oreval_pipelineEE.py
). Is there any difference between real scores and scores reported in the training script?
I think I have an idea. I notice generate_data_degree_xxx.py
script also samples negative examples for dev set and test set, which means we only use part of dev set and test set to evaluate during training. So we can either run other scripts to get real scores, or not sample negative examples for dev set and test set in generate_data_degree_xxx.py
which will increase much training time. Is that right?
Yes, to reduce the training time, we use internal evaluation for selecting the best checkpoint.
Hello, I found that there is only
trigger_id
evaluation metric in your code, but you gottrigger_cls
in your paper. I tried to calculatetrigger_cls
like this:But I still get
trigger_cls
which is the same astrigger_id
. How to gettrigger_cls
correctly?