Hello!
I try to run your code on conll03-ner dataset. But, performance I get is not as good as bilstm-crf. Could you help me find the bug? Thanks.
Here is part of my experiment log.
True
Seed num: 42
MODEL: train
Load pretrained word embedding, norm: False, dir: ../Data/pretrain_emb/glove.6B.100d.txt
Embedding:
pretrain word:400000, prefect match:11415, case_match:11656, oov:2234, oov%:0.08827945941673912
Training model...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DATA SUMMARY START:
I/O:
Tag scheme: BMES
MAX SENTENCE LENGTH: 250
MAX WORD LENGTH: -1
Number normalized: True
Word alphabet size: 25306
Char alphabet size: 78
Label alphabet size: 18
Word embedding dir: ../Data/pretrain_emb/glove.6B.100d.txt
Char embedding dir: None
Word embedding size: 100
Char embedding size: 30
Norm word emb: False
Norm char emb: False
Train file directory: ../Data/conll03/conll03.train.bmes
Dev file directory: ../Data/conll03/conll03.dev.bmes
Test file directory: ../Data/conll03/conll03.test.bmes
Raw file directory: None
Dset file directory: None
Model file directory: save/label_embedding
Loadmodel directory: None
Decode file directory: None
Train instance number: 14987
Dev instance number: 3466
Test instance number: 3684
Raw instance number: 0
FEATURE num: 0
++++++++++++++++++++++++++++++++++++++++
Model Network:
Model use_crf: False
Model word extractor: LSTM
Model use_char: True
Model char extractor: LSTM
Model char_hidden_dim: 50
++++++++++++++++++++++++++++++++++++++++
Training:
Optimizer: SGD
Iteration: 100
BatchSize: 10
Average batch loss: False
++++++++++++++++++++++++++++++++++++++++
Hyperparameters:
Hyper lr: 0.01
Hyper lr_decay: 0.04
Hyper HP_clip: None
Hyper momentum: 0.9
Hyper l2: 1e-08
Hyper hidden_dim: 400
Hyper dropout: 0.5
Hyper lstm_layer: 4
Hyper bilstm: True
Hyper GPU: True
DATA SUMMARY END.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
build network...
use_char: True
char feature extractor: LSTM
word feature extractor: LSTM
build word sequence feature extractor: LSTM...
build word representation...
build char sequence feature extractor: LSTM ...
--------pytorch total params--------
9849140
Epoch: 0/100
Learning rate is set as: 0.01
Instance: 14987; Time: 125.29s; loss: 2452.2396; acc: 172887.0/204567.0=0.8451
Epoch: 0 training finished. Time: 125.29s, speed: 119.62st/s, total loss: 126550.64305019379
totalloss: 126550.64305019379
gold_num = 5942 pred_num = 6508 right_num = 2556
Dev: time: 11.00s, speed: 317.98st/s; acc: 0.9036, p: 0.3927, r: 0.4302, f: 0.4106
gold_num = 5648 pred_num = 6351 right_num = 2261
Test: time: 10.95s, speed: 339.54st/s; acc: 0.8919, p: 0.3560, r: 0.4003, f: 0.3769
Exceed previous best f score: -10
Hello! I try to run your code on conll03-ner dataset. But, performance I get is not as good as bilstm-crf. Could you help me find the bug? Thanks. Here is part of my experiment log.
True Seed num: 42 MODEL: train Load pretrained word embedding, norm: False, dir: ../Data/pretrain_emb/glove.6B.100d.txt Embedding: pretrain word:400000, prefect match:11415, case_match:11656, oov:2234, oov%:0.08827945941673912 Training model... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DATA SUMMARY START: I/O: Tag scheme: BMES MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Word alphabet size: 25306 Char alphabet size: 78 Label alphabet size: 18 Word embedding dir: ../Data/pretrain_emb/glove.6B.100d.txt Char embedding dir: None Word embedding size: 100 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: ../Data/conll03/conll03.train.bmes Dev file directory: ../Data/conll03/conll03.dev.bmes Test file directory: ../Data/conll03/conll03.test.bmes Raw file directory: None Dset file directory: None Model file directory: save/label_embedding Loadmodel directory: None Decode file directory: None Train instance number: 14987 Dev instance number: 3466 Test instance number: 3684 Raw instance number: 0 FEATURE num: 0 ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: False Model word extractor: LSTM Model use_char: True Model char extractor: LSTM Model char_hidden_dim: 50 ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 100 BatchSize: 10 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.01 Hyper lr_decay: 0.04 Hyper HP_clip: None Hyper momentum: 0.9 Hyper l2: 1e-08 Hyper hidden_dim: 400 Hyper dropout: 0.5 Hyper lstm_layer: 4 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ build network... use_char: True char feature extractor: LSTM word feature extractor: LSTM build word sequence feature extractor: LSTM... build word representation... build char sequence feature extractor: LSTM ... --------pytorch total params-------- 9849140 Epoch: 0/100 Learning rate is set as: 0.01 Instance: 14987; Time: 125.29s; loss: 2452.2396; acc: 172887.0/204567.0=0.8451 Epoch: 0 training finished. Time: 125.29s, speed: 119.62st/s, total loss: 126550.64305019379 totalloss: 126550.64305019379 gold_num = 5942 pred_num = 6508 right_num = 2556 Dev: time: 11.00s, speed: 317.98st/s; acc: 0.9036, p: 0.3927, r: 0.4302, f: 0.4106 gold_num = 5648 pred_num = 6351 right_num = 2261 Test: time: 10.95s, speed: 339.54st/s; acc: 0.8919, p: 0.3560, r: 0.4003, f: 0.3769 Exceed previous best f score: -10