可能的一个bug - Githubissues

649453932 / Chinese-Text-Classification-Pytorch

中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。

MIT License

5.25k stars 1.22k forks source link

可能的一个bug #24

Closed hans208 closed 4 years ago

hans208 commented 4 years ago

在utils.py里的load_dataset函数里，第56行 token.extend([vocab.get(PAD)] (pad_size - len(token))) 在token里padding，获得的是PAD的id 但下面的代码做了word to id,
for word in token: words_line.append(vocab.get(word, vocab.get(UNK))) 这样的话，因为PAD的id不在词库里，所以PAD都变成了UNK的id了吧，所以个人认为第56行应该是 token.extend([PAD] (pad_size - len(token)))

ysgncss commented 4 years ago

帮帮哒，能加个好友么

649453932 commented 4 years ago

跪谢！晚上回去改