GlassyWing / bi-lstm-crf

使用keras实现的基于Bi-LSTM + CRF的中文分词+词性标注
Apache License 2.0
373 stars 84 forks source link

您好,训练发现loss为负数,crf_accuracy一直下降 #15

Closed YYGe01 closed 4 years ago

YYGe01 commented 4 years ago

Epoch 1/32 2000/2000 [==============================] - 2107s 1s/step - loss: 0.4634 - crf_accuracy: 0.8636 - val_loss: 0.1261 - val_crf_accuracy: 0.9155 Epoch 2/32 2000/2000 [==============================] - 2095s 1s/step - loss: 0.1106 - crf_accuracy: 0.9120 - val_loss: 0.1009 - val_crf_accuracy: 0.8968 Epoch 3/32 2000/2000 [==============================] - 2088s 1s/step - loss: 0.0789 - crf_accuracy: 0.8760 - val_loss: 0.0157 - val_crf_accuracy: 0.8600 Epoch 4/32 2000/2000 [==============================] - 2086s 1s/step - loss: -0.0747 - crf_accuracy: 0.8349 - val_loss: -0.2816 - val_crf_accuracy: 0.8311 Epoch 5/32 2000/2000 [==============================] - 2092s 1s/step - loss: -0.4189 - crf_accuracy: 0.8114 - val_loss: -0.7276 - val_crf_accuracy: 0.8185 Epoch 6/32 2000/2000 [==============================] - 2083s 1s/step - loss: -0.9522 - crf_accuracy: 0.7970 - val_loss: -1.4038 - val_crf_accuracy: 0.8003 Epoch 7/32 2000/2000 [==============================] - 2079s 1s/step - loss: -1.6091 - crf_accuracy: 0.7862 - val_loss: -2.0125 - val_crf_accuracy: 0.7894 Epoch 8/32 2000/2000 [==============================] - 2081s 1s/step - loss: -2.3388 - crf_accuracy: 0.7761 - val_loss: -2.7899 - val_crf_accuracy: 0.7861 Epoch 9/32 2000/2000 [==============================] - 1947s 974ms/step - loss: -3.0759 - crf_accuracy: 0.7687 - val_loss: -0.8432 - val_crf_accuracy: 0.7548 Epoch 10/32 2000/2000 [==============================] - 2082s 1s/step - loss: -4.0530 - crf_accuracy: 0.7636 - val_loss: -4.4467 - val_crf_accuracy: 0.7617 Epoch 11/32 2000/2000 [==============================] - 2010s 1s/step - loss: -5.0285 - crf_accuracy: 0.7501 - val_loss: -5.6542 - val_crf_accuracy: 0.7364 Epoch 12/32 2000/2000 [==============================] - 2082s 1s/step - loss: -6.1917 - crf_accuracy: 0.7440 - val_loss: -7.3522 - val_crf_accuracy: 0.7747 Epoch 13/32 2000/2000 [==============================] - 2091s 1s/step - loss: -7.3947 - crf_accuracy: 0.7500 - val_loss: -8.3709 - val_crf_accuracy: 0.7618 Epoch 14/32 2000/2000 [==============================] - 2088s 1s/step - loss: -8.7067 - crf_accuracy: 0.7481 - val_loss: -9.4891 - val_crf_accuracy: 0.7552 Epoch 15/32 2000/2000 [==============================] - 2081s 1s/step - loss: -10.0457 - crf_accuracy: 0.7477 - val_loss: -10.8923 - val_crf_accuracy: 0.7486 Epoch 16/32 2000/2000 [==============================] - 2075s 1s/step - loss: -11.6075 - crf_accuracy: 0.7496 - val_loss: -12.9277 - val_crf_accuracy: 0.7411 Epoch 17/32 2000/2000 [==============================] - 2084s 1s/step - loss: -13.4381 - crf_accuracy: 0.7502 - val_loss: -14.8373 - val_crf_accuracy: 0.7625 Epoch 18/32 2000/2000 [==============================] - 2088s 1s/step - loss: -14.9084 - crf_accuracy: 0.7379 - val_loss: -15.6858 - val_crf_accuracy: 0.7443 Epoch 19/32 2000/2000 [==============================] - 2091s 1s/step - loss: -16.6394 - crf_accuracy: 0.7373 - val_loss: -18.9170 - val_crf_accuracy: 0.7803 Epoch 20/32 2000/2000 [==============================] - 2095s 1s/step - loss: 8.7573 - crf_accuracy: 0.7414 - val_loss: -4.6712 - val_crf_accuracy: 0.7364 Epoch 21/32 2000/2000 [==============================] - 2053s 1s/step - loss: -0.2120 - crf_accuracy: 0.7432 - val_loss: -16.9051 - val_crf_accuracy: 0.7578 Epoch 22/32 2000/2000 [==============================] - 2091s 1s/step - loss: -17.4731 - crf_accuracy: 0.7500 - val_loss: -22.1556 - val_crf_accuracy: 0.7590 Epoch 23/32 2000/2000 [==============================] - 2090s 1s/step - loss: -22.3906 - crf_accuracy: 0.7471 - val_loss: -25.7236 - val_crf_accuracy: 0.7367 Epoch 24/32 2000/2000 [==============================] - 2087s 1s/step - loss: -24.3706 - crf_accuracy: 0.7457 - val_loss: -26.9764 - val_crf_accuracy: 0.7515 Epoch 25/32 2000/2000 [==============================] - 1965s 982ms/step - loss: -25.0672 - crf_accuracy: 0.7494 - val_loss: -23.7698 - val_crf_accuracy: 0.7411 Epoch 26/32 2000/2000 [==============================] - 2096s 1s/step - loss: -4.4255 - crf_accuracy: 0.7520 - val_loss: -20.3805 - val_crf_accuracy: 0.7790 Epoch 27/32 2000/2000 [==============================] - 2086s 1s/step - loss: -29.5299 - crf_accuracy: 0.7589 - val_loss: -34.7593 - val_crf_accuracy: 0.7548 Epoch 28/32 2000/2000 [==============================] - 2090s 1s/step - loss: -33.6366 - crf_accuracy: 0.7511 - val_loss: -35.7223 - val_crf_accuracy: 0.7674

这个说明过拟合了嘛,应该调整代码哪里呢,谢谢

GlassyWing commented 4 years ago

这是由于keras的CRF模块的激活函数采用负对数似然,调整的话在模型后面添加一个Relu层:https://github.com/GlassyWing/bi-lstm-crf/blob/ed72b6be0068052a5a0fb117fa3c778cf2b671f3/dl_segmenter/core.py#L72