Lizhen0628 / text_classification

使用rnn,lstm,gru,fasttext,textcnn,dpcnn,rnn-att,lstm-att,兼容huggleface/transformers,以及以transforemrs作为词嵌入模型,后面接入cnn、rnn、attention等等做文本分类。以及各个模型的对比
491 stars 76 forks source link

训练HAN模型报错 #7

Open ZZKa opened 3 years ago

ZZKa commented 3 years ago

请问大佬测试过HAN模型吗?我训练的时候会报RuntimeError: CUDA error: device-side assert triggered,请问是什么原因呢? 请帮忙解答,非常感谢!

error log: 0it [00:00, ?it/s]Building prefix dict from the default dictionary ... hierattnet Building prefix dict from the default dictionary ... Loading model from cache /tmp/jieba.cache Loading model from cache /tmp/jieba.cache Loading model cost 2.178 seconds. Loading model cost 2.178 seconds. Prefix dict has been built successfully. Prefix dict has been built successfully. 50000it [06:22, 130.57it/s] 5000it [00:37, 132.38it/s] 10000it [01:20, 123.86it/s]HierAttNet( (word_att_net): WordAttNet( (dropout): Dropout(p=0.5, inplace=False) (embedding): Embedding(144241, 300) (rnn): GRU(300, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True) ) (sent_att_net): SentAttNet( (rnn): GRU(128, 64, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True) (fc): Linear(in_features=128, out_features=10, bias=True) ) ) Trainable parameters: 398602 /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [16,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: …… …… …… [14,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [82,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [83,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [84,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [85,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [86,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [87,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [14,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last): File "train.py", line 134, in run('configs/multi_classification/han_config.json') File "train.py", line 105, in run main(config, use_transformers=False) File "train.py", line 80, in main trainer.train() File "/home/work/zzk/text_classification/base/base_trainer.py", line 67, in train result = self._train_epoch(epoch) File "/home/work/zzk/text_classification/trainer/trainer.py", line 52, in _train_epoch output = self.model(input_token_ids,bert_masks, seq_lens).squeeze(1) File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, *kwargs) File "/home/work/zzk/text_classification/model/model.py", line 404, in forward word_output, hidden = self.word_att_net(input_token_ids,seq_lens) File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, **kwargs) File "/home/work/zzk/text_classification/model/model.py", line 473, in forward packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, sorted_seq_lengths, batch_first=self.batch_first) File "/home/work/anaconda3/envs/zzk_torch_py36/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 234, in pack_padded_sequence lengths = torch.as_tensor(lengths, dtype=torch.int64) RuntimeError: CUDA error: device-side assert triggered

ZZKa commented 3 years ago

看到更新了V2版本,HAN模型没了?

Lizhen0628 commented 3 years ago

@ZZKa HAN 没有加入,等有时间会再更新加入。 从你报错内容来看,猜测应该是token id 超出了embedding size