INK-USC / shifted-label-distribution

Source code for paper "Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction" (EMNLP 2019)
https://arxiv.org/abs/1904.09331
Apache License 2.0
39 stars 2 forks source link

what batch_size do you use for neural models #3

Open ShellingFord221 opened 3 years ago

ShellingFord221 commented 3 years ago

Hi, I wonder what batch_size do you use for each neural model? It is really hard to reproduce results in Table 3 without knowing this parameter...

cherry979988 commented 3 years ago

Hi @ShellingFord221

We did not tune learning rate or batch size specifically, please use the default setting in the args (lr=1.0 and using decay, batch size=64) We tuned the dropout for each task and model. Please try the following settings and let me know if you have further questions.

KBP BiGRU: input_dropout 0.5, intra_dropout 0.1, output_dropout 0.6 KBP BiLSTM: input_dropout 0.5, intra_dropout 0.2, output_dropout 0.8 KBP PA-LSTM: input_dropout 0.5, intra_dropout 0.1, output_dropout 0.5 KBP PCNN: dropout 0.2 KBP CNN: dropout 0.3

NYT BiGRU: input_dropour 0.5, intra_dropout 0.3, output_dropout 0.5 NYT BiLSTM: input_dropout 0.4, intra_dropout 0.3, output_dropout 0.7 NYT PA-LSTM: input_dropout 0.5, intra_dropout 0.3, output_dropout 0.7 NYT PCNN: dropout 0.2 NYT CNN: dropout 0.2

TACRED BiGRU: input_dropout 0.6, intra_dropout 0.1, output_dropout 0.6 TACRED BiLSTM: input_dropout 0.7, intra_dropout 0.1, output_dropout 0.7 TACRED PA-LSTM: input_dropout 0.7, intra_dropout 0.3, output_dropout 0.6 TACRED PCNN: dropout 0.4 TACRED CNN: dropout 0.3

cc: @Milozms

ShellingFord221 commented 3 years ago

Hi, I tried parameters recommended above. Here are the results of CNN_KBP: Original 27.35±1.42 Max-thres 31.20±2.04 Entropy-thres 32.18±2.39 BA-Set 28.97±0.75 It seems that the results of Original and BA-Set are a little bit worse than the results reported in Table 3. Here is my command: python Neural/train.py --model cnn --data_dir data/neural/KBP --hidden 230 --lr 1.0 --in_drop 0.3 --info cnn_kbp --repeat 5 Thanks!

ShellingFord221 commented 3 years ago

I also tried PCNN_KBP with dropout 0.2: Original 27.92±1.10 Max-thres 31.28±2.10 Entropy-thres 32.74±3.28 BA-Set 30.26±1.10 Similarly, Original and BA-Set are a little bit worse than the results reported in Table 3.

cherry979988 commented 3 years ago

Hi @ShellingFord221 Could you please try adding --lower in the command for cnn/pcnn models?

ShellingFord221 commented 3 years ago

Hi, I try the following command, and here is what I get: python Neural/train.py --model cnn --data_dir data/neural/KBP --hidden 230 --lower --lr 1.0 --in_drop 0.3 --info cnn_kbp --repeat 5 Original 26.25±2.08 Max-thres 30.16±3.21 Entropy-thres 31.71±3.49 BA-Set 29.64±3.38

It seems that Original is getting worse ... Here are all my parameters: Namespace(attn_dim=200, batch_size=64, bias=True, bidirectional=True, cpu=False, cuda=True, data_dir='data/neural/KBP', emb_dim=300, fix_bias=False, hidden=230, in_drop=0.3, info='cnn_kbp', intra_drop=0.1, lower=True, lr=1.0, lr_decay=0.9, mask_with_type=True, max_grad_norm=5.0, model='cnn', ner_dim=30, num_epoch=30, num_layers=2, out_drop=0.6, pos_dim=30, position_dim=30, repeat=5, save_dir='./dumped_models', seed=7698, state_drop=0.5, vocab_dir='data/neural/vocab', vocab_size=33826, window_size=3)

Thank you so much!!

cherry979988 commented 3 years ago

Hi @ShellingFord221

Please try --lr_decay 0.1, this should solve the problem, and I should have made 0.1 the default value in the code.

python Neural/train.py --model cnn --data_dir data/neural/KBP --hidden 230 --lr 1.0 --lr_decay 0.1 --dropout 0.3 --info cnn_kbp --repeat 5 --lower     
python Neural/train.py --model pcnn --data_dir data/neural/KBP --hidden 230 --lr 1.0 --lr_decay 0.1 --dropout 0.2 --info pcnn_kbp --repeat 5 --lower

Thanks for bringing up this issue!

ShellingFord221 commented 3 years ago

Hi, I tried all the recommended parameters, PCNN, Bi-GRU, and Bi-LSTM all achieve the desired performance. However, set_bias method of CNN seems underperform the original model on KBP: original: 30.48 set_bias: 28.17 fix_bias: 35.19
Thanks!