Nealcly / BiLSTM-LAN

Hierarchically-Refined Label Attention Network for Sequence Labeling
Apache License 2.0
285 stars 50 forks source link

Experiments on CoNLL03NER #9

Open SUDA-HLT-ywfang opened 4 years ago

SUDA-HLT-ywfang commented 4 years ago

Hello! I try to run your code on conll03-ner dataset. But, performance I get is not as good as bilstm-crf. Could you help me find the bug? Thanks. Here is part of my experiment log.

True Seed num: 42 MODEL: train Load pretrained word embedding, norm: False, dir: ../Data/pretrain_emb/glove.6B.100d.txt Embedding: pretrain word:400000, prefect match:11415, case_match:11656, oov:2234, oov%:0.08827945941673912 Training model... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DATA SUMMARY START: I/O: Tag scheme: BMES MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: True Word alphabet size: 25306 Char alphabet size: 78 Label alphabet size: 18 Word embedding dir: ../Data/pretrain_emb/glove.6B.100d.txt Char embedding dir: None Word embedding size: 100 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: ../Data/conll03/conll03.train.bmes Dev file directory: ../Data/conll03/conll03.dev.bmes Test file directory: ../Data/conll03/conll03.test.bmes Raw file directory: None Dset file directory: None Model file directory: save/label_embedding Loadmodel directory: None Decode file directory: None Train instance number: 14987 Dev instance number: 3466 Test instance number: 3684 Raw instance number: 0 FEATURE num: 0 ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: False Model word extractor: LSTM Model use_char: True Model char extractor: LSTM Model char_hidden_dim: 50 ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 100 BatchSize: 10 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.01 Hyper lr_decay: 0.04 Hyper HP_clip: None Hyper momentum: 0.9 Hyper l2: 1e-08 Hyper hidden_dim: 400 Hyper dropout: 0.5 Hyper lstm_layer: 4 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ build network... use_char: True char feature extractor: LSTM word feature extractor: LSTM build word sequence feature extractor: LSTM... build word representation... build char sequence feature extractor: LSTM ... --------pytorch total params-------- 9849140 Epoch: 0/100 Learning rate is set as: 0.01 Instance: 14987; Time: 125.29s; loss: 2452.2396; acc: 172887.0/204567.0=0.8451 Epoch: 0 training finished. Time: 125.29s, speed: 119.62st/s, total loss: 126550.64305019379 totalloss: 126550.64305019379 gold_num = 5942 pred_num = 6508 right_num = 2556 Dev: time: 11.00s, speed: 317.98st/s; acc: 0.9036, p: 0.3927, r: 0.4302, f: 0.4106 gold_num = 5648 pred_num = 6351 right_num = 2261 Test: time: 10.95s, speed: 339.54st/s; acc: 0.8919, p: 0.3560, r: 0.4003, f: 0.3769 Exceed previous best f score: -10

TianlinZhang668 commented 4 years ago

uhnn, i got f1 which is 89, and i use the BIO format

TianlinZhang668 commented 4 years ago

but in some chinese dataset . The results are also low