First of all, Thanks for your last reply.
As your command I execute model with Ontonotes v5.0.
Although your official f1-score is 88.16%, I always get 85%.
When I execute your model with UD, I got very good performance. So I think I have something mistake.
Hi
First of all, Thanks for your last reply. As your command I execute model with Ontonotes v5.0. Although your official f1-score is 88.16%, I always get 85%. When I execute your model with UD, I got very good performance. So I think I have something mistake.
It is my command. python main.py --learning_rate 0.01 --lr_decay 0.035 --dropout 0.5 --hidden_dim 400 --lstm_layer 4 --momentum 0.9 --whether_clip_grad True --clip_grad 5.0 --train_dir 'data/onto.train.txt' --dev_dir 'data/onto.development.txt' --test_dir 'data/onto.test.txt' --model_dir 'model/' --word_emb_dir 'glove.6B.100d.txt'
It is summary. DATA SUMMARY START: I/O: Tag scheme: BIO MAX SENTENCE LENGTH: 250 MAX WORD LENGTH: -1 Number normalized: False Word alphabet size: 69812 Char alphabet size: 119 Label alphabet size: 38 Word embedding dir: glove.6B.100d.txt Char embedding dir: None Word embedding size: 100 Char embedding size: 30 Norm word emb: False Norm char emb: False Train file directory: data/onto.train.txt Dev file directory: data/onto.development.txt Test file directory: data/onto.test.txt Raw file directory: None Dset file directory: None Model file directory: model/ Loadmodel directory: None Decode file directory: None Train instance number: 115812 Dev instance number: 15679 Test instance number: 12217 Raw instance number: 0 FEATURE num: 0 ++++++++++++++++++++++++++++++++++++++++ Model Network: Model use_crf: False Model word extractor: LSTM Model use_char: True Model char extractor: LSTM Model char_hidden_dim: 50 ++++++++++++++++++++++++++++++++++++++++ Training: Optimizer: SGD Iteration: 100 BatchSize: 10 Average batch loss: False ++++++++++++++++++++++++++++++++++++++++ Hyperparameters: Hyper lr: 0.01 Hyper lr_decay: 0.035 Hyper HP_clip: 5.0 Hyper momentum: 0.9 Hyper l2: 1e-08 Hyper hidden_dim: 400 Hyper dropout: 0.5 Hyper lstm_layer: 4 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END.
I think I follow the hyperparameters well written in your paper. Is there any mistake?
Thanks for reading.