Closed Buted closed 3 years ago
Yes, this is a bug.
replace the loss with self.BCE = nn.BCEWithLogitsLoss(reduction='none')
could fix it.
But I guess it did not hurt performance. It'll learn pad -> 0. Did you get the new score?
Yeah, setting reduction to 'none' did fix the bug. Model may learn pad -> 0, but there could be some uncertainty (I don't test it).
There is another question, but not a bug, for me: Which git node should I use to reproduce the nyt_seq2umt_spo results of paper?
I see that you committed saved_experiments/nyt_seq2umt_spo.log at git node ca418db. So I run codes at ca418db, where 'lib' dir stored 'openjere' codes. And I get the following log:
2021-03-24 22:35:15,881 - seq2umt 2021-03-24 22:59:28,245 - precision: 0.6498, recall: 0.6700, fscore: 0.6598 || 2021-03-24 23:08:19,803 - precision: 0.7130, recall: 0.6460, fscore: 0.6778 || ... 2021-03-25 05:53:16,296 - precision: 0.8283, recall: 0.5624, fscore: 0.6699 || 2021-03-25 05:53:16,297 - best epoch: 36 F1 = 0.74 2021-03-25 05:54:41,306 - precision: 0.7777, recall: 0.6725, fscore: 0.7213 ||
The final F1 score is 0.7213, different from 0.771 in paper.
There is my configure json:
{ "dataset": "nyt", "model": "seq2umt", "data_root": "data/nyt/seq2umt_spo", "raw_data_root": "raw_data/nyt", "train": "new_train_data.json", "dev": "new_validate_data.json", "test": "new_test_data.json", "raw_data_list": ["new_test_data.json", "new_train_data.json", "new_validate_data.json"], "relation_vocab": "relation_vocab.json", "print_epoch": 1, "evaluation_epoch":38, "max_text_len": 1000, "cell_name": "lstm", "emb_size": 200, "rel_emb_size": 50, "bio_emb_size": 50, "hidden_size": 200, "threshold": 0.5, "order": ["subject", "predicate", "object"], "activation": "tanh", "optimizer": "adam", "epoch_num":50, "batch_size_train": 32, "batch_size_eval":400, "seperator": " ", "gpu": 1 }
I also train nyt_seq2umt_spo at node 634b3b1, and It's still training. However, line 2 in log file is different from original log:
Original saved_experiments/nyt_seq2umt_spo.log 2020-04-26 15:15:54,521 - seq2umt 2020-04-26 15:34:36,234 - precision: 0.6921, recall: 0.6473, fscore: 0.6690 || My nyt_seq2umt_spo.log 2021-03-25 09:50:30,021 - seq2umt 2021-03-25 10:14:29,474 - precision: 0.6483, recall: 0.6888, fscore: 0.6679 ||
ps: I will update the final score when training is done.
pps: I also train nyt_wdec and get 0.8011 finally.
Thanks for your answer :)
in the paper,
not very far away from your results.
I did not use fixed random seeds because the xxr order is very sensitive to random seeds. I want my readers to be aware of this by experiment and explore the reasons and improvements.
My impression is that the scores (except for xxr) vary in 0.02.
0.732 is actually from the current node, please see the log file below.
https://github.com/WindChimeRan/OpenJERE/blob/master/saved_experiments/nyt_seq2umt_spo.log
Sry, I take a mistake about the nyt-spo score in the paper. My fault. Thanks for answering :)
Thanks for u contribution. I have a question about MaskedBCE class when reading codes. MaskedBCE use torch.nn.BCEWithLogitsLoss, whose 'reduction' argument is set to 'mean'. It means BCEWithLogitsLoss will regard pad tokens as usual tokens and take the mean operation among batch loss. Therefore, it wouldn't eliminate padding influence using mask argument.
I also print loss in the forward function of MaskedBCE as following:
The outputs are same:
I checked mask and there is 0 (padding).
Configuration
git node hash: 634b3b1