BrikerMan / Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
http://kashgari.readthedocs.io/
Apache License 2.0
2.39k stars 441 forks source link

[BUG]BLSTMCRFModel模型val_acc结果错误 #10

Closed phoenixkillerli closed 5 years ago

phoenixkillerli commented 5 years ago

val_acc计算错误

测试代码用《NLP - 基于 BERT 的中文命名实体识别(NER)》文中例子

from kashgari.corpus import *
train_x, train_y = ChinaPeoplesDailyNerCorpus.get_sequence_tagging_data('train')
validate_x, validate_y = ChinaPeoplesDailyNerCorpus.get_sequence_tagging_data('validate')
test_x, test_y  = ChinaPeoplesDailyNerCorpus.get_sequence_tagging_data('test')

print(f"train data count: {len(train_x)}")
print(f"validate data count: {len(validate_x)}")
print(f"test data count: {len(test_x)}")

from kashgari.embeddings import BERTEmbedding
embedding = BERTEmbedding('./bert', 128)
from kashgari.tasks.seq_labeling import BLSTMCRFModel
model = BLSTMCRFModel(embedding)
model.fit(train_x,
          train_y,
          y_validate=validate_y,
          x_validate=validate_x,
          epochs=10,
          batch_size=500)

部分结果如下

Epoch 1/10
41/41 [==============================] - 105s 3s/step - loss: 0.2520 - crf_accuracy: 0.9303 - 
acc: 0.6245 - val_loss: 0.0724 - val_crf_accuracy: 0.9789 - val_acc: 0.9789
Epoch 2/10
41/41 [==============================] - 100s 2s/step - loss: 0.0548 - crf_accuracy: 0.9838 - 
acc: 0.6246 - val_loss: 0.0357 - val_crf_accuracy: 0.9898 - val_acc: 0.9898

val_crf_accuracy与val_acc结果一样

BrikerMan commented 5 years ago

这一块应该是我设置问题,一般 crf 训练时候只用 crf_accuracy 作为指标,我设置了 [crf_accuracy, 'acc'] 所以同时存在俩,目前你只关注 crf_accuracyval_crf_accuracy 即可。

具体问题所在行 https://github.com/BrikerMan/Kashgari/blob/e11f0b10819a5482c9c6e7c6f6777d092a00cc4e/kashgari/tasks/seq_labeling/blstm_crf_model.py#L44

phoenixkillerli commented 5 years ago

但是 val_acc 和 acc 差距很大,这个是什么原因?

lonelyhentxi commented 5 years ago

相同的问题,val_acc和val_crf_accuracy相同,并且 val_acc有0.9+,acc只有0.7

BrikerMan commented 5 years ago

已经移除引起歧义的 acc