649453932 / Chinese-Text-Classification-Pytorch

中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
MIT License
5.25k stars 1.22k forks source link

模型跑长文本数据时loss波动太大,训练不出来,这是为什么呢? #44

Open sxk000 opened 4 years ago

sxk000 commented 4 years ago

感谢大佬分享! 本人跑代码中的短文本时,结果和你发的一样,然后我就用模型跑长文本,还是THUCNews数据集,用的数据是:https://github.com/gaussic/text-classification-cnn-rnn中用到的数据,下载链接: https://pan.baidu.com/s/1hugrfRu 密码: qfud

pad_size设置成600或1000都不行,日志如下: Loading data... 50000it [00:27, 1849.42it/s] 5000it [00:02, 1879.22it/s] 10000it [00:05, 1667.15it/s] Time usage: 0:00:36 <bound method Module.parameters of Model( (embedding): Embedding(6206, 300) (convs): ModuleList( (0): Conv2d(1, 256, kernel_size=(2, 300), stride=(1, 1)) (1): Conv2d(1, 256, kernel_size=(3, 300), stride=(1, 1)) (2): Conv2d(1, 256, kernel_size=(4, 300), stride=(1, 1)) ) (dropout): Dropout(p=0.5) (fc): Linear(in_features=768, out_features=10, bias=True) )> Epoch [1/20] Iter: 0, Train Loss: 2.2, Train Acc: 17.97%, Val Loss: 3.5, Val Acc: 10.00%, Time: 0:00:03 Iter: 100, Train Loss: 1.4e-06, Train Acc: 100.00%, Val Loss: 2.3e+01, Val Acc: 10.00%, Time: 0:00:41 Iter: 200, Train Loss: 1.2e+01, Train Acc: 0.00%, Val Loss: 2.1e+01, Val Acc: 10.00%, Time: 0:01:19 Iter: 300, Train Loss: 1.3e-06, Train Acc: 100.00%, Val Loss: 3.3e+01, Val Acc: 10.00%, Time: 0:01:58 Epoch [2/20] Iter: 400, Train Loss: 1.8, Train Acc: 45.31%, Val Loss: 1.2e+01, Val Acc: 14.48%, Time: 0:02:35 Iter: 500, Train Loss: 2.4e-05, Train Acc: 100.00%, Val Loss: 2.2e+01, Val Acc: 10.00%, Time: 0:03:13 Iter: 600, Train Loss: 1.0, Train Acc: 68.75%, Val Loss: 3.2, Val Acc: 15.80%, Time: 0:03:51 Iter: 700, Train Loss: 0.06, Train Acc: 98.44%, Val Loss: 8.1, Val Acc: 10.00%, Time: 0:04:29 Epoch [3/20] Iter: 800, Train Loss: 0.0094, Train Acc: 100.00%, Val Loss: 1.6e+01, Val Acc: 10.00%, Time: 0:05:06 Iter: 900, Train Loss: 2.1e+01, Train Acc: 0.00%, Val Loss: 2.3e+01, Val Acc: 10.00%, Time: 0:05:45 Iter: 1000, Train Loss: 0.25, Train Acc: 99.22%, Val Loss: 3.5, Val Acc: 11.32%, Time: 0:06:23 Iter: 1100, Train Loss: 3.9, Train Acc: 0.00%, Val Loss: 3.9, Val Acc: 13.44%, Time: 0:07:01 Epoch [4/20] Iter: 1200, Train Loss: 0.0073, Train Acc: 100.00%, Val Loss: 1.1e+01, Val Acc: 11.70%, Time: 0:07:38 Iter: 1300, Train Loss: 0.47, Train Acc: 85.94%, Val Loss: 7.9, Val Acc: 18.34%, Time: 0:08:17 Iter: 1400, Train Loss: 0.063, Train Acc: 100.00%, Val Loss: 4.9, Val Acc: 12.82%, Time: 0:08:55 Iter: 1500, Train Loss: 0.47, Train Acc: 89.84%, Val Loss: 3.1, Val Acc: 23.12%, Time: 0:09:33 * Epoch [5/20] Iter: 1600, Train Loss: 0.0037, Train Acc: 100.00%, Val Loss: 7.4, Val Acc: 15.36%, Time: 0:10:10 Iter: 1700, Train Loss: 0.027, Train Acc: 100.00%, Val Loss: 1.3e+01, Val Acc: 10.34%, Time: 0:10:49 Iter: 1800, Train Loss: 3.7, Train Acc: 3.91%, Val Loss: 5.2, Val Acc: 11.98%, Time: 0:11:27 Iter: 1900, Train Loss: 0.1, Train Acc: 98.44%, Val Loss: 4.0, Val Acc: 24.80%, Time: 0:12:05 Epoch [6/20] Iter: 2000, Train Loss: 0.72, Train Acc: 76.56%, Val Loss: 4.2, Val Acc: 28.34%, Time: 0:12:42 Iter: 2100, Train Loss: 0.034, Train Acc: 99.22%, Val Loss: 8.1, Val Acc: 13.46%, Time: 0:13:20 Iter: 2200, Train Loss: 0.4, Train Acc: 92.97%, Val Loss: 4.2, Val Acc: 33.14%, Time: 0:13:59 Iter: 2300, Train Loss: 0.2, Train Acc: 96.09%, Val Loss: 4.6, Val Acc: 22.86%, Time: 0:14:36 Epoch [7/20] Iter: 2400, Train Loss: 0.16, Train Acc: 96.88%, Val Loss: 3.7, Val Acc: 35.28%, Time: 0:15:13 Iter: 2500, Train Loss: 0.014, Train Acc: 99.22%, Val Loss: 8.2, Val Acc: 11.10%, Time: 0:15:52 No optimization for a long time, auto-stopping... /usr/local/lib/python3.5/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, warn_for) Test Loss: 3.3, Test Acc: 23.61% Precision, Recall and F1-Score... precision recall f1-score support

      a     0.8361    0.0510    0.0961      1000
      b     1.0000    0.0140    0.0276      1000
      c     1.0000    0.0020    0.0040      1000
      d     0.0000    0.0000    0.0000      1000
      e     0.0000    0.0000    0.0000      1000
      f     0.1711    0.7450    0.2783      1000
      g     0.8728    0.5490    0.6740      1000
      h     0.0791    0.0510    0.0620      1000
      i     0.2523    0.9490    0.3986      1000
      j     0.0000    0.0000    0.0000      1000

avg / total 0.4211 0.2361 0.1541 10000

Confusion Matrix... [[ 51 0 0 0 0 325 0 0 91 533] [ 3 14 0 0 0 627 1 159 196 0] [ 1 0 2 0 0 182 3 415 397 0] [ 2 0 0 0 0 458 56 16 468 0] [ 3 0 0 0 0 866 4 4 123 0] [ 0 0 0 0 0 745 12 0 243 0] [ 1 0 0 0 0 55 549 0 395 0] [ 0 0 0 0 0 891 0 51 58 0] [ 0 0 0 0 0 50 1 0 949 0] [ 0 0 0 0 0 155 3 0 842 0]] Time usage: 0:00:06 模型跑长文本数据时loss波动太大,训练不出来,这是为什么呢?应该如何修改呢?谢谢!

sxk000 commented 4 years ago

我找到原因了,是这个数据集每一类的数据都是都是连续的,就是0-5000是类别a,5001-10000是类别b,......,本模型在数据处理的时候也没有打乱样本,训练时require_improvement = 1000,即每1000个样本训练时有最优模型时保存模型,这样导致了,每一次保存的模型,都是针对该次训练对应数据的最优模型,而对应的数据类别又大多数(或者全部)都是同一个类别,所以每次保存的模型都是对某一类别的最优,下一轮训练时,数据类别又变了,进而导致loss一直上下波动,不能收敛。

解决方法是,在数据处理时随机打乱一下数据。具体改动如下:

utils.py文件 第2行由原来的的:import os,改为:import os, random 第64行: return contents # [([...], 0), ([...], 1), ...] 前,增加一行代码: random.shuffle(contents)#随机打乱数据

同理修改utils_fasttext.py文件 第2行由原来的的:import os,改为:import os, random 第83行: return contents # [([...], 0), ([...], 1), ...] 前,增加一行代码: random.shuffle(contents)#随机打乱数据

修改后训练测试的日志如下: Loading data... 50000it [00:47, 1062.81it/s] 5000it [00:04, 1068.21it/s] 10000it [00:10, 953.53it/s] Time usage: 0:01:02 <bound method Module.parameters of Model( (embedding): Embedding(6206, 300) (convs): ModuleList( (0): Conv2d(1, 256, kernel_size=(2, 300), stride=(1, 1)) (1): Conv2d(1, 256, kernel_size=(3, 300), stride=(1, 1)) (2): Conv2d(1, 256, kernel_size=(4, 300), stride=(1, 1)) ) (dropout): Dropout(p=0.5) (fc): Linear(in_features=768, out_features=10, bias=True) )> Epoch [1/20] Iter: 0, Train Loss: 2.3, Train Acc: 8.59%, Val Loss: 2.4, Val Acc: 10.14%, Time: 0:00:06 Iter: 100, Train Loss: 0.27, Train Acc: 92.19%, Val Loss: 0.33, Val Acc: 89.12%, Time: 0:01:27 Iter: 200, Train Loss: 0.2, Train Acc: 93.75%, Val Loss: 0.22, Val Acc: 93.68%, Time: 0:02:47 Iter: 300, Train Loss: 0.11, Train Acc: 96.88%, Val Loss: 0.17, Val Acc: 95.08%, Time: 0:04:08 Epoch [2/20] Iter: 400, Train Loss: 0.22, Train Acc: 94.53%, Val Loss: 0.2, Val Acc: 94.24%, Time: 0:05:27 Iter: 500, Train Loss: 0.18, Train Acc: 94.53%, Val Loss: 0.15, Val Acc: 95.56%, Time: 0:06:48 Iter: 600, Train Loss: 0.098, Train Acc: 96.09%, Val Loss: 0.15, Val Acc: 95.54%, Time: 0:08:08 Iter: 700, Train Loss: 0.065, Train Acc: 97.66%, Val Loss: 0.12, Val Acc: 96.36%, Time: 0:09:28 Epoch [3/20] Iter: 800, Train Loss: 0.059, Train Acc: 97.66%, Val Loss: 0.12, Val Acc: 96.52%, Time: 0:10:48 Iter: 900, Train Loss: 0.096, Train Acc: 96.88%, Val Loss: 0.14, Val Acc: 95.50%, Time: 0:12:08 Iter: 1000, Train Loss: 0.19, Train Acc: 93.75%, Val Loss: 0.18, Val Acc: 93.28%, Time: 0:13:29 Iter: 1100, Train Loss: 0.05, Train Acc: 98.44%, Val Loss: 0.14, Val Acc: 95.24%, Time: 0:14:49 Epoch [4/20] Iter: 1200, Train Loss: 0.062, Train Acc: 97.66%, Val Loss: 0.11, Val Acc: 96.84%, Time: 0:16:09 Iter: 1300, Train Loss: 0.038, Train Acc: 99.22%, Val Loss: 0.1, Val Acc: 96.80%, Time: 0:17:29 Iter: 1400, Train Loss: 0.034, Train Acc: 99.22%, Val Loss: 0.11, Val Acc: 96.76%, Time: 0:18:49 Iter: 1500, Train Loss: 0.059, Train Acc: 98.44%, Val Loss: 0.13, Val Acc: 95.34%, Time: 0:20:09 Epoch [5/20] Iter: 1600, Train Loss: 0.025, Train Acc: 99.22%, Val Loss: 0.14, Val Acc: 95.44%, Time: 0:21:29 Iter: 1700, Train Loss: 0.055, Train Acc: 97.66%, Val Loss: 0.1, Val Acc: 96.94%, Time: 0:22:49 Iter: 1800, Train Loss: 0.038, Train Acc: 98.44%, Val Loss: 0.15, Val Acc: 95.08%, Time: 0:24:09 Iter: 1900, Train Loss: 0.048, Train Acc: 99.22%, Val Loss: 0.11, Val Acc: 96.58%, Time: 0:25:29 Epoch [6/20] Iter: 2000, Train Loss: 0.043, Train Acc: 99.22%, Val Loss: 0.12, Val Acc: 96.34%, Time: 0:26:49 Iter: 2100, Train Loss: 0.067, Train Acc: 98.44%, Val Loss: 0.16, Val Acc: 94.42%, Time: 0:28:09 Iter: 2200, Train Loss: 0.041, Train Acc: 98.44%, Val Loss: 0.12, Val Acc: 96.06%, Time: 0:29:30 Iter: 2300, Train Loss: 0.089, Train Acc: 96.88%, Val Loss: 0.099, Val Acc: 96.86%, Time: 0:30:50 Epoch [7/20] Iter: 2400, Train Loss: 0.028, Train Acc: 98.44%, Val Loss: 0.12, Val Acc: 96.82%, Time: 0:32:09 Iter: 2500, Train Loss: 0.007, Train Acc: 100.00%, Val Loss: 0.13, Val Acc: 96.20%, Time: 0:33:30 Iter: 2600, Train Loss: 0.013, Train Acc: 99.22%, Val Loss: 0.14, Val Acc: 95.78%, Time: 0:34:50 Iter: 2700, Train Loss: 0.048, Train Acc: 99.22%, Val Loss: 0.14, Val Acc: 95.76%, Time: 0:36:10 Epoch [8/20] Iter: 2800, Train Loss: 0.0045, Train Acc: 100.00%, Val Loss: 0.15, Val Acc: 95.34%, Time: 0:37:30 Iter: 2900, Train Loss: 0.0084, Train Acc: 100.00%, Val Loss: 0.15, Val Acc: 95.62%, Time: 0:38:50 Iter: 3000, Train Loss: 0.01, Train Acc: 99.22%, Val Loss: 0.12, Val Acc: 96.84%, Time: 0:40:10 Iter: 3100, Train Loss: 0.013, Train Acc: 99.22%, Val Loss: 0.18, Val Acc: 95.06%, Time: 0:41:30 Epoch [9/20] Iter: 3200, Train Loss: 0.0037, Train Acc: 100.00%, Val Loss: 0.17, Val Acc: 94.94%, Time: 0:42:49 Iter: 3300, Train Loss: 0.031, Train Acc: 99.22%, Val Loss: 0.15, Val Acc: 95.56%, Time: 0:44:09 No optimization for a long time, auto-stopping... Test Loss: 0.074, Test Acc: 97.80% Precision, Recall and F1-Score... precision recall f1-score support

      a     0.9990    0.9990    0.9990      1000
      b     0.9802    0.9880    0.9841      1000
      c     0.9898    0.9690    0.9793      1000
      d     0.9725    0.9550    0.9637      1000
      e     0.9731    0.9410    0.9568      1000
      f     0.9705    0.9860    0.9782      1000
      g     0.9753    0.9890    0.9821      1000
      h     0.9607    0.9790    0.9698      1000
      i     0.9714    0.9840    0.9776      1000
      j     0.9880    0.9900    0.9890      1000

avg / total 0.9781 0.9780 0.9780 10000

Confusion Matrix... [[999 0 0 0 0 0 0 0 1 0] [ 0 988 0 1 1 2 0 8 0 0] [ 0 10 969 9 2 0 0 8 0 2] [ 0 5 7 955 8 2 8 10 4 1] [ 0 1 0 5 941 15 7 13 14 4] [ 0 1 0 1 0 986 2 1 7 2] [ 1 0 0 3 3 3 989 0 0 1] [ 0 2 3 1 9 5 0 979 1 0] [ 0 1 0 2 2 1 8 0 984 2] [ 0 0 0 5 1 2 0 0 2 990]] Time usage: 0:00:11

sxk000 commented 3 years ago

你好!我也是长文本训练,文本长度设置为1000,但是训练过程中train-loss 和val-loss慢慢收敛成1了,test-loss 是0,混淆矩阵也是0,请问是什么问题导致的呢?

我上面分析的很清楚了,你仔细看一下

sxk000 commented 3 years ago

我的数据不是连续的,初步分析跟词向量有关,你的词表和词向量是怎么处理的呢 词表和词向量没有问题吧,我这边是正常的

sxk000 commented 3 years ago

能否加一下联系方式,想请教一下,QQ1363698077

不方便,非常抱歉

prozyworld commented 3 years ago

简直太牛了 解决了很大问题 你是咋发现的 你要不说我感觉我这长文本没个整了

prozyworld commented 3 years ago

能否把推荐的分类分数输出出来