allanj / ner_with_dependency

GNU General Public License v3.0
72 stars 11 forks source link

A question about reproduction #10

Closed speedcell4 closed 4 years ago

speedcell4 commented 4 years ago

Hello.

About the Chinese corpus, In table 4, you reported the naive BiLSTM-CRF (L = 2) can reach 76.61 (F1 score). I have tried to reproduce that by using my own implementation, but I can not get that number. Could you please tell me how to reproduce that with your implementation? What's the command to do that?

allanj commented 4 years ago

Hi I re-run my code after receiving your question.

I obtain Precision: 78.31, Recall: 74.82, F1: 76.53. 76.53 F1 score.

my argument is simply --dataset=ontonotes_chinese --embedding_file=cc.zh.300.vec --num_lstm_layer=2 Other arguments remain default.

  1. Make sure you have the correct split for ontonotes_chinese. You should have exactly same number of sentences and entities as indicated in Table 2. image Let me know if you need help on the datasets.
  2. The embedding is FastText embedding which can be downloaded.
speedcell4 commented 4 years ago

Thank you, I will try your command

allanj commented 4 years ago

But you reported 77.40 F1 score in Table 4. Could you please explain the gap?

I think I reported 76.6. I think the difference for 0.08 should be caused by the CUDA version as well as PyTorch version

Hello.

About the Chinese corpus, In table 4, you reported the naive BiLSTM-CRF (L = 2) can reach 76.61 (F1 score). I have tried to > reproduce that by using my own implementation, but I can not get that number. Could you please tell me how to reproduce that with your implementation? What's the command to do that?

allanj commented 4 years ago

Feel free to re-open the issue if you have any questions.

speedcell4 commented 4 years ago

Sorry, I think the number will not go to 76.53, could you please check if I made mistake here?

➜  ner_with_dependency git:(master) ✗ cat train.log
100%|██████████| 332648/332648 [01:10<00:00, 4751.30it/s]
100%|██████████| 792550/792550 [00:03<00:00, 219014.32it/s]
100%|██████████| 116117/116117 [00:00<00:00, 170371.28it/s]
100%|██████████| 96780/96780 [00:00<00:00, 219183.13it/s]mode: train
device: cuda:3
seed: 42
digit2zero: True
dataset: ontonotes_chinese
affix: sd
embedding_file: ~/fasttext/wiki.zh.vec
embedding_dim: 100
optimizer: sgd
learning_rate: 0.01
momentum: 0.0
l2: 1e-08
lr_decay: 0
batch_size: 10
num_epochs: 100
train_num: -1
dev_num: -1
test_num: -1
eval_freq: 4000
eval_epoch: 0
hidden_dim: 200
num_lstm_layer: 2
dep_emb_size: 50
dep_hidden_dim: 200
num_gcn_layers: 1
gcn_mlp_layers: 1
gcn_dropout: 0.5
gcn_adj_directed: 0
gcn_adj_selfloop: 0
gcn_gate: 0
dropout: 0.5
use_char_rnn: 1
dep_model: none
inter_func: mlp
context_emb: none
[Info] remember to chec the root dependency label if changing the data. current: root
reading the pretraing embedding: ~/fasttext/wiki.zh.vec
using GPU... 0
Reading file: data/ontonotes_chinese/train.sd.conllx
number of sentences: 36487, number of entities: 62543
Reading file: data/ontonotes_chinese/dev.sd.conllx
number of sentences: 6083, number of entities: 9104
Reading file: data/ontonotes_chinese/test.sd.conllx
number of sentences: 4472, number of entities: 7494
#labels: 76
label 2idx: {'<PAD>': 0, 'O': 1, 'B-WORK_OF_ART': 2, 'I-WORK_OF_ART': 3, 'E-WORK_OF_ART': 4, 'S-NORP': 5, 'S-EVENT': 6, 'S-LOC': 7, 'S-FAC': 8, 'S-ORG': 9, 'S-GPE': 10, 'B-EVENT': 11, 'I-EVENT': 12, 'E-EVENT': 13, 'S-DATE': 14, 'B-ORG': 15, 'E-ORG': 16, 'S-PERSON': 17, 'B-DATE': 18, 'E-DATE': 19, 'I-DATE': 20, 'B-FAC': 21, 'E-FAC': 22, 'B-QUANTITY': 23, 'E-QUANTITY': 24, 'B-LOC': 25, 'E-LOC': 26, 'S-ORDINAL': 27, 'S-CARDINAL': 28, 'B-TIME': 29, 'I-TIME': 30, 'E-TIME': 31, 'I-FAC': 32, 'I-ORG': 33, 'I-LOC': 34, 'B-GPE': 35, 'E-GPE': 36, 'S-TIME': 37, 'B-LAW': 38, 'I-LAW': 39, 'E-LAW': 40, 'B-PERSON': 41, 'E-PERSON': 42, 'S-PERCENT': 43, 'B-MONEY': 44, 'I-MONEY': 45, 'E-MONEY': 46, 'I-QUANTITY': 47, 'S-LANGUAGE': 48, 'I-GPE': 49, 'S-WORK_OF_ART': 50, 'B-ORDINAL': 51, 'E-ORDINAL': 52, 'B-CARDINAL': 53, 'E-CARDINAL': 54, 'S-QUANTITY': 55, 'B-NORP': 56, 'E-NORP': 57, 'I-PERSON': 58, 'S-PRODUCT': 59, 'I-NORP': 60, 'S-LAW': 61, 'B-PERCENT': 62, 'I-PERCENT': 63, 'E-PERCENT': 64, 'I-CARDINAL': 65, 'S-MONEY': 66, 'I-ORDINAL': 67, 'B-PRODUCT': 68, 'I-PRODUCT': 69, 'E-PRODUCT': 70, 'B-LANGUAGE': 71, 'E-LANGUAGE': 72, 'I-LANGUAGE': 73, '<START>': 74, '<STOP>': 75}
# deplabels:  48
dep label 2idx:  {'self': 0, 'assmod': 1, 'assm': 2, 'root': 3, 'punct': 4, 'ccomp': 5, 'vmod': 6, 'dobj': 7, 'amod': 8, 'nn': 9, 'lobj': 10, 'nsubj': 11, 'advmod': 12, 'prep': 13, 'plmod': 14, 'dep': 15, 'nummod': 16, 'clf': 17, 'conj': 18, 'cc': 19, 'pobj': 20, 'etc': 21, 'top': 22, 'rcmod': 23, 'cpm': 24, 'attr': 25, 'det': 26, 'asp': 27, 'tmod': 28, 'mmod': 29, 'cop': 30, 'prtmod': 31, 'ba': 32, 'pccomp': 33, 'rcomp': 34, 'neg': 35, 'comod': 36, 'loc': 37, 'dvpmod': 38, 'dvpm': 39, 'range': 40, 'ordmod': 41, 'pass': 42, 'lccomp': 43, 'prnmod': 44, 'xsubj': 45, 'erased': 46, 'nsubjpass': 47}
Building the embedding table for vocabulary...
[Info] Use the pretrained word embedding to initialize: 48050 x 300
[Info] 21759 out of 48050 found in the pretrained embedding.
num chars: 4280
num words: 48050
[Info] Building character-level LSTM
[Model Info] Input size to LSTM: 350
[Model Info] LSTM Hidden Size: 200
[Model Info] Dep Method: none, hidden size: 200
[Model Info] Final Hidden Size: 200
Using SGD: lr is: 0.01, L2 regularization is: 1e-08
number of instances: 36487
[Shuffled] Shuffle the training instance ids
[Info] The model will be saved to: model_files/lstm_2_200_crf_ontonotes_chinese_sd_-1_dep_none_elmo_none_sgd_gate_0_epoch_100_lr_0.01_comb_InteractionFunction.mlp.m, please ensure models folder exist
learning rate is set to:  0.01
Epoch 1: 285262.48737, Time is 354.71s
[dev set] Precision: 63.44, Recall: 45.57, F1: 53.04
[test set] Precision: 64.58, Recall: 46.49, F1: 54.06
saving the best model...
learning rate is set to:  0.01
Epoch 2: 145113.75470, Time is 355.09s
[dev set] Precision: 72.55, Recall: 56.22, F1: 63.35
[test set] Precision: 73.36, Recall: 56.95, F1: 64.12
saving the best model...
learning rate is set to:  0.01
Epoch 3: 110279.62048, Time is 380.69s
[dev set] Precision: 71.95, Recall: 62.87, F1: 67.11
[test set] Precision: 73.27, Recall: 64.60, F1: 68.66
saving the best model...
learning rate is set to:  0.01
Epoch 4: 92156.31793, Time is 357.13s
[dev set] Precision: 72.52, Recall: 64.47, F1: 68.26
[test set] Precision: 74.66, Recall: 66.71, F1: 70.46
saving the best model...
learning rate is set to:  0.01
Epoch 5: 80271.48053, Time is 383.48s
[dev set] Precision: 74.60, Recall: 64.76, F1: 69.34
[test set] Precision: 75.64, Recall: 66.37, F1: 70.70
saving the best model...
learning rate is set to:  0.01
Epoch 6: 72338.41998, Time is 365.83s
[dev set] Precision: 73.89, Recall: 66.18, F1: 69.82
[test set] Precision: 74.94, Recall: 68.12, F1: 71.37
saving the best model...
learning rate is set to:  0.01
Epoch 7: 67194.87714, Time is 336.44s
[dev set] Precision: 75.67, Recall: 65.22, F1: 70.06
[test set] Precision: 76.55, Recall: 67.20, F1: 71.57
saving the best model...
learning rate is set to:  0.01
Epoch 8: 63102.04980, Time is 341.07s
[dev set] Precision: 72.92, Recall: 66.38, F1: 69.50
[test set] Precision: 74.87, Recall: 69.60, F1: 72.14
learning rate is set to:  0.01
Epoch 9: 60138.14362, Time is 373.32s
[dev set] Precision: 76.52, Recall: 64.14, F1: 69.78
[test set] Precision: 77.26, Recall: 66.29, F1: 71.36
learning rate is set to:  0.01
Epoch 10: 56856.79376, Time is 378.18s
[dev set] Precision: 73.41, Recall: 68.32, F1: 70.77
[test set] Precision: 74.27, Recall: 69.98, F1: 72.06
saving the best model...
learning rate is set to:  0.01
Epoch 11: 54602.61407, Time is 374.20s
[dev set] Precision: 74.37, Recall: 66.14, F1: 70.01
[test set] Precision: 75.20, Recall: 67.79, F1: 71.30
learning rate is set to:  0.01
Epoch 12: 52556.44318, Time is 382.57s
[dev set] Precision: 74.39, Recall: 66.27, F1: 70.09
[test set] Precision: 76.18, Recall: 68.95, F1: 72.38
learning rate is set to:  0.01
Epoch 13: 50726.64886, Time is 315.86s
[dev set] Precision: 74.05, Recall: 64.97, F1: 69.21
[test set] Precision: 75.41, Recall: 67.37, F1: 71.17
learning rate is set to:  0.01
Epoch 14: 48930.41962, Time is 371.07s
[dev set] Precision: 74.74, Recall: 64.76, F1: 69.39
[test set] Precision: 75.88, Recall: 66.92, F1: 71.12
learning rate is set to:  0.01
Epoch 15: 47359.81354, Time is 370.67s
[dev set] Precision: 74.76, Recall: 64.73, F1: 69.38
[test set] Precision: 75.91, Recall: 67.23, F1: 71.30
learning rate is set to:  0.01
Epoch 16: 45624.51654, Time is 373.16s
[dev set] Precision: 74.90, Recall: 65.15, F1: 69.68
[test set] Precision: 76.09, Recall: 68.01, F1: 71.82
learning rate is set to:  0.01
Epoch 17: 44205.07025, Time is 383.83s
[dev set] Precision: 72.88, Recall: 66.96, F1: 69.80
[test set] Precision: 73.88, Recall: 69.55, F1: 71.65
learning rate is set to:  0.01
Epoch 18: 42528.52795, Time is 358.87s
[dev set] Precision: 74.55, Recall: 64.53, F1: 69.18
[test set] Precision: 76.10, Recall: 67.31, F1: 71.43
learning rate is set to:  0.01
Epoch 19: 41136.60205, Time is 365.05s
[dev set] Precision: 74.41, Recall: 66.00, F1: 69.95
[test set] Precision: 75.72, Recall: 68.69, F1: 72.04
learning rate is set to:  0.01
Epoch 20: 39516.44708, Time is 365.03s
[dev set] Precision: 74.95, Recall: 65.40, F1: 69.85
[test set] Precision: 76.15, Recall: 68.16, F1: 71.93
learning rate is set to:  0.01
Epoch 21: 37937.25543, Time is 354.72s
[dev set] Precision: 73.59, Recall: 65.73, F1: 69.44
[test set] Precision: 74.45, Recall: 68.60, F1: 71.41
learning rate is set to:  0.01
Epoch 22: 36180.26184, Time is 330.58s
[dev set] Precision: 74.52, Recall: 65.45, F1: 69.69
[test set] Precision: 75.91, Recall: 68.09, F1: 71.79
learning rate is set to:  0.01
Epoch 23: 34918.09973, Time is 382.19s
[dev set] Precision: 74.38, Recall: 64.57, F1: 69.12
[test set] Precision: 75.31, Recall: 67.16, F1: 71.00
learning rate is set to:  0.01
Epoch 24: 33956.29456, Time is 356.17s
[dev set] Precision: 75.15, Recall: 64.07, F1: 69.17
[test set] Precision: 76.46, Recall: 66.67, F1: 71.23
learning rate is set to:  0.01
Epoch 25: 32491.26746, Time is 386.56s
[dev set] Precision: 73.29, Recall: 64.39, F1: 68.55
[test set] Precision: 75.02, Recall: 67.56, F1: 71.09
learning rate is set to:  0.01
Epoch 26: 31350.37323, Time is 367.27s
[dev set] Precision: 73.43, Recall: 65.59, F1: 69.29
[test set] Precision: 74.62, Recall: 68.36, F1: 71.36
learning rate is set to:  0.01
Epoch 27: 29972.20898, Time is 376.46s
[dev set] Precision: 74.56, Recall: 63.12, F1: 68.36
[test set] Precision: 75.43, Recall: 66.00, F1: 70.40
learning rate is set to:  0.01
Epoch 28: 29056.32819, Time is 385.42s
[dev set] Precision: 74.73, Recall: 63.29, F1: 68.54
[test set] Precision: 76.20, Recall: 66.12, F1: 70.80
learning rate is set to:  0.01
Epoch 29: 28039.82849, Time is 380.09s
[dev set] Precision: 73.82, Recall: 64.69, F1: 68.95
[test set] Precision: 74.73, Recall: 67.56, F1: 70.97
learning rate is set to:  0.01
Epoch 30: 27130.20428, Time is 383.35s
[dev set] Precision: 74.79, Recall: 64.51, F1: 69.27
[test set] Precision: 75.62, Recall: 67.16, F1: 71.14
learning rate is set to:  0.01
Epoch 31: 26052.45343, Time is 383.74s
[dev set] Precision: 73.42, Recall: 63.06, F1: 67.85
[test set] Precision: 75.07, Recall: 65.97, F1: 70.23
learning rate is set to:  0.01
Epoch 32: 25246.19189, Time is 375.13s
[dev set] Precision: 74.74, Recall: 63.43, F1: 68.62
[test set] Precision: 75.71, Recall: 66.19, F1: 70.63
learning rate is set to:  0.01
Epoch 33: 24432.55402, Time is 376.70s
[dev set] Precision: 74.43, Recall: 64.59, F1: 69.16
[test set] Precision: 75.18, Recall: 67.31, F1: 71.03
learning rate is set to:  0.01
Epoch 34: 23708.08826, Time is 373.94s
[dev set] Precision: 72.38, Recall: 66.36, F1: 69.24
[test set] Precision: 73.22, Recall: 68.86, F1: 70.97
learning rate is set to:  0.01
Epoch 35: 22966.82373, Time is 377.03s
[dev set] Precision: 73.23, Recall: 65.49, F1: 69.14
[test set] Precision: 74.17, Recall: 68.08, F1: 70.99
learning rate is set to:  0.01
Epoch 36: 22225.60510, Time is 387.31s
[dev set] Precision: 73.54, Recall: 64.49, F1: 68.72
[test set] Precision: 74.38, Recall: 67.36, F1: 70.70
learning rate is set to:  0.01
Epoch 37: 21468.44885, Time is 388.09s
[dev set] Precision: 74.43, Recall: 63.43, F1: 68.49
[test set] Precision: 75.29, Recall: 66.39, F1: 70.56
learning rate is set to:  0.01
Epoch 38: 21135.80164, Time is 372.39s
[dev set] Precision: 74.25, Recall: 64.08, F1: 68.79
[test set] Precision: 74.98, Recall: 66.48, F1: 70.48
learning rate is set to:  0.01
Epoch 39: 20200.03711, Time is 328.25s
[dev set] Precision: 74.79, Recall: 62.48, F1: 68.08
[test set] Precision: 75.73, Recall: 65.12, F1: 70.02
learning rate is set to:  0.01
Epoch 40: 19579.57947, Time is 367.86s
[dev set] Precision: 74.05, Recall: 63.64, F1: 68.45
[test set] Precision: 74.92, Recall: 66.69, F1: 70.57
learning rate is set to:  0.01
Epoch 41: 19219.25000, Time is 383.97s
[dev set] Precision: 73.92, Recall: 63.60, F1: 68.37
[test set] Precision: 74.47, Recall: 66.04, F1: 70.00
learning rate is set to:  0.01
Epoch 42: 18975.31970, Time is 381.50s
[dev set] Precision: 73.81, Recall: 64.40, F1: 68.79
[test set] Precision: 75.01, Recall: 66.92, F1: 70.73
learning rate is set to:  0.01
Epoch 43: 18075.84424, Time is 375.39s
[dev set] Precision: 73.53, Recall: 64.75, F1: 68.86
[test set] Precision: 74.61, Recall: 67.49, F1: 70.88
learning rate is set to:  0.01
Epoch 44: 17629.43896, Time is 378.80s
[dev set] Precision: 72.85, Recall: 63.92, F1: 68.09
[test set] Precision: 73.61, Recall: 66.71, F1: 69.99
speedcell4 commented 4 years ago

seems I downloaded a wrong FastText embedding file? it should be cc.zh.300.vec not wiki.zh.vec?

allanj commented 4 years ago

Not sure if you download it from here: https://fasttext.cc/docs/en/crawl-vectors.html https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.zh.300.vec.gz

allanj commented 4 years ago

Here comes my log:

100%|##########| 2000001/2000001 [05:47<00:00, 5759.91it/s]
100%|##########| 792550/792550 [00:02<00:00, 286725.60it/s]
100%|##########| 116117/116117 [00:00<00:00, 226584.28it/s]
100%|##########| 96780/96780 [00:00<00:00, 314986.30it/s]mode: train
device: cuda:1
seed: 42
digit2zero: True
dataset: ontonotes_chinese
affix: sd
embedding_file: data/cc.zh.300.vec
embedding_dim: 100
optimizer: sgd
learning_rate: 0.01
momentum: 0.0
l2: 1e-08
lr_decay: 0.0
batch_size: 10
num_epochs: 100
train_num: -1
dev_num: -1
test_num: -1
eval_freq: 10000
eval_epoch: 0
hidden_dim: 200
num_lstm_layer: 2
dep_emb_size: 50
dep_hidden_dim: 200
num_gcn_layers: 1
gcn_mlp_layers: 1
gcn_dropout: 0.5
gcn_adj_directed: 0
gcn_adj_selfloop: 0
gcn_gate: 0
num_base: -1
dep_double_label: 0
dropout: 0.5
use_char_rnn: 1
dep_method: none
comb_method: 3
context_emb: none
[Info] remember to chec the root dependency label if changing the data. current: root
reading the pretraing embedding: data/cc.zh.300.vec
using GPU... 0
Reading file: data/ontonotes_chinese/train.sd.conllx
number of sentences: 36487, number of entities: 62543
Reading file: data/ontonotes_chinese/dev.sd.conllx
number of sentences: 6083, number of entities: 9104
Reading file: data/ontonotes_chinese/test.sd.conllx
number of sentences: 4472, number of entities: 7494
#labels: 76
label 2idx: {'<PAD>': 0, 'O': 1, 'B-WORK_OF_ART': 2, 'I-WORK_OF_ART': 3, 'E-WORK_OF_ART': 4, 'S-NORP': 5, 'S-EVENT': 6, 'S-LOC': 7, 'S-FAC': 8, 'S-ORG': 9, 'S-GPE': 10, 'B-EVENT': 11, 'I-EVENT': 12, 'E-EVENT': 13, 'S-DATE': 14, 'B-ORG': 15, 'E-ORG': 16, 'S-PERSON': 17, 'B-DATE': 18, 'E-DATE': 19, 'I-DATE': 20, 'B-FAC': 21, 'E-FAC': 22, 'B-QUANTITY': 23, 'E-QUANTITY': 24, 'B-LOC': 25, 'E-LOC': 26, 'S-ORDINAL': 27, 'S-CARDINAL': 28, 'B-TIME': 29, 'I-TIME': 30, 'E-TIME': 31, 'I-FAC': 32, 'I-ORG': 33, 'I-LOC': 34, 'B-GPE': 35, 'E-GPE': 36, 'S-TIME': 37, 'B-LAW': 38, 'I-LAW': 39, 'E-LAW': 40, 'B-PERSON': 41, 'E-PERSON': 42, 'S-PERCENT': 43, 'B-MONEY': 44, 'I-MONEY': 45, 'E-MONEY': 46, 'I-QUANTITY': 47, 'S-LANGUAGE': 48, 'I-GPE': 49, 'S-WORK_OF_ART': 50, 'B-ORDINAL': 51, 'E-ORDINAL': 52, 'B-CARDINAL': 53, 'E-CARDINAL': 54, 'S-QUANTITY': 55, 'B-NORP': 56, 'E-NORP': 57, 'I-PERSON': 58, 'S-PRODUCT': 59, 'I-NORP': 60, 'S-LAW': 61, 'B-PERCENT': 62, 'I-PERCENT': 63, 'E-PERCENT': 64, 'I-CARDINAL': 65, 'S-MONEY': 66, 'I-ORDINAL': 67, 'B-PRODUCT': 68, 'I-PRODUCT': 69, 'E-PRODUCT': 70, 'B-LANGUAGE': 71, 'E-LANGUAGE': 72, 'I-LANGUAGE': 73, '<START>': 74, '<STOP>': 75}
Building the embedding table for vocabulary...
[Info] Use the pretrained word embedding to initialize: 48050 x 300
num chars: 4280
num words: 48050
[Info] Building character-level LSTM
[Model Info] Input size to LSTM: 350
[Model Info] LSTM Hidden Size: 200
[Model Info] Dep Method: none, hidden size: 200
[Model Info] Final Hidden Size: 200
Using SGD: lr is: 0.01, L2 regularization is: 1e-08
number of instances: 36487
[Shuffled] Shuffle the training instance ids
[Info] The model will be saved to: model_files/lstm_2_200_crf_ontonotes_chinese_sd_-1_dep_none_elmo_none_sgd_gate_0_base_-1_epoch_100_lr_0.01_doubledep_0_comb_3_num_-1.m, please ensure models folder exist
learning rate is set to:  0.01
Epoch 1: 207501.81577, Time is 371.88s
[dev set] Precision: 70.07, Recall: 67.59, F1: 68.81
[test set] Precision: 71.79, Recall: 70.14, F1: 70.96
saving the best model...
learning rate is set to:  0.01
Epoch 2: 98910.82928, Time is 378.90s
[dev set] Precision: 75.51, Recall: 67.40, F1: 71.22
[test set] Precision: 77.07, Recall: 69.52, F1: 73.10
saving the best model...
learning rate is set to:  0.01
Epoch 3: 77156.39874, Time is 360.30s
[dev set] Precision: 75.22, Recall: 69.17, F1: 72.06
[test set] Precision: 76.94, Recall: 72.03, F1: 74.40
saving the best model...
learning rate is set to:  0.01
Epoch 4: 66149.95203, Time is 377.85s
[dev set] Precision: 75.36, Recall: 71.57, F1: 73.42
[test set] Precision: 76.79, Recall: 74.27, F1: 75.51
saving the best model...
learning rate is set to:  0.01
Epoch 5: 58631.06305, Time is 371.01s
[dev set] Precision: 75.85, Recall: 70.65, F1: 73.16
[test set] Precision: 77.89, Recall: 73.62, F1: 75.69
learning rate is set to:  0.01
Epoch 6: 53461.45251, Time is 369.11s
[dev set] Precision: 72.94, Recall: 71.35, F1: 72.14
[test set] Precision: 74.62, Recall: 74.69, F1: 74.65
learning rate is set to:  0.01
Epoch 7: 49768.43829, Time is 364.93s
[dev set] Precision: 76.17, Recall: 68.93, F1: 72.37
[test set] Precision: 78.27, Recall: 72.19, F1: 75.11
learning rate is set to:  0.01
Epoch 8: 46370.73358, Time is 363.80s
[dev set] Precision: 74.28, Recall: 72.01, F1: 73.13
[test set] Precision: 76.24, Recall: 75.39, F1: 75.81
allanj commented 4 years ago

Yeah right. I can see your embedding is different:

100%|##########| 2000001/2000001 [05:47<00:00, 5759.91it/s]
100%|##########| 792550/792550 [00:02<00:00, 286725.60it/s]
100%|##########| 116117/116117 [00:00<00:00, 226584.28it/s]
100%|##########| 96780/96780 [00:00<00:00, 314986.30it/s]

cc.zh.300.vec has 2000001 vocab size

speedcell4 commented 4 years ago

Thank you so much, I will try that embedding file. I downloaded it from https://fasttext.cc/docs/en/pretrained-vectors.html

allanj commented 4 years ago

The link you send actually refers to the old version. image Yeah. Please try the new version.

speedcell4 commented 4 years ago

Sorry for bothering you again. For Chinese Elmo, which pre-trained embedding I should download?

image

allanj commented 4 years ago

Sorry. I don't really remember. But I guess it's the one below.

speedcell4 commented 4 years ago

Thank you~ You are right, I did experiments, it should be the one below.

allanj commented 4 years ago

Good to hear it works!