649453932 / Chinese-Text-Classification-Pytorch

中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用。

MIT License

5.27k stars 1.23k forks source link

字转为id有问题 #20

Closed duguiming111 closed 4 years ago

duguiming111 commented 4 years ago

if pad_size: if len(token) < pad_size: token.extend([vocab.get(PAD)] * (pad_size - len(token))) else: token = token[:pad_size] seq_len = pad_size

word to id

            for word in token:
                words_line.append(vocab.get(word, vocab.get(UNK)))
            contents.append((words_line, int(label), seq_len))

这里，vocab.get(word, vocab.get(UNK))得到上面PAD补长的id，这个id不在字典中，最后都成了UNK的id。

649453932 commented 4 years ago

构建词表的时候已经把PAD和UNK加进去了： vocab_dic.update({UNK: len(vocab_dic), PAD: len(vocab_dic) + 1})

duguiming111 commented 4 years ago

这里的确有问题！大佬，好好检查一下！

649453932 commented 4 years ago

看到啦，谢谢大佬！晚上回去就改