Replicating results on MATRES using the pre-trained model

ynandwan commented 3 years ago

Hi Thank you for making your code and trained models public! I found them very useful, but I am having a little difficulty in replicating the numbers reported in the paper. Would appreciate it if you could help me with it and clarify some doubts that I have:

I am getting only 0.71 micro F1-score on MATRES Test set using the pre-trained weights shared in the repo (0104_3.pt) and the number reported in the paper is 0.788 micro F1 score. The 0.71 score is even less than the 0.735 F1 reported with Single Task Learning in Table 5. Could you please confirm if 0104_3.pt is indeed the best model for MATRES?
Also, the best model suggested for MATRES in predict.py (0104_3.pt) corresponds to just a fine-tuned version of Roberta (roberta_mlp) and not the LSTM one. Also, the LSTM model is not using the POS Tags, as stated in the paper: it only gets embedding from Roberta model, calculated in the function exp.my_func.
In addition, the architecture shared in the paper has common sense features encoded through an additional MLP. But I couldn't find these common-sense features in any of the two models in the model.py module. Is there a separate module for it, which I am missing?
Lastly, I couldn't find the Global Inference (ILP) formulation in the code. Could you please point me towards it?

Any help/pointers would be really appreciated!

Thanks,

Lastdier commented 2 years ago

0104_3.pt isn't in the repo. I can't find it. Is it generated by your code?

ynandwan commented 2 years ago

The link to the pre-trained models is in the README. Pasting it here as well: https://drive.google.com/drive/folders/1PyNAlNHY144pGsko9iYxwYlqf4ud0Lq1

ddawsari commented 2 years ago

did you reach any conclusions on this? I'm also getting the same F1 score you reported with 0104_3.pt (0.71)

ynandwan commented 2 years ago

Nope. Didn't hear back from the authors.

why2011btv commented 2 years ago

The F1 score reported in the paper is not the regular micro F1 for 4-class classification. I followed the same way to calculate it as in the EMNLP 2019 paper by Qiang Ning et al. (see Appendix A on page 5). And here’s the link to their paper: https://arxiv.org/pdf/1909.00429.pdf

For how to get the common sense feature, please refer to this repo: https://github.com/qiangning/NeuralTemporalRelation-EMNLP19

And due to licensing issues from gurobi, global optimization is not included in our repo.

CogComp / JointConstrainedLearning

Replicating results on MATRES using the pre-trained model #1