649453932 / Chinese-Text-Classification-Pytorch

中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
MIT License
5.27k stars 1.23k forks source link

字转为id有问题 #20

Closed duguiming111 closed 4 years ago

duguiming111 commented 4 years ago

if pad_size: if len(token) < pad_size: token.extend([vocab.get(PAD)] * (pad_size - len(token))) else: token = token[:pad_size] seq_len = pad_size

word to id

            for word in token:
                words_line.append(vocab.get(word, vocab.get(UNK)))
            contents.append((words_line, int(label), seq_len))

这里,vocab.get(word, vocab.get(UNK))得到上面PAD补长的id,这个id不在字典中,最后都成了UNK的id。

649453932 commented 4 years ago

构建词表的时候已经把PAD和UNK加进去了: vocab_dic.update({UNK: len(vocab_dic), PAD: len(vocab_dic) + 1})

duguiming111 commented 4 years ago

image 这里的确有问题!大佬,好好检查一下!

649453932 commented 4 years ago

看到啦,谢谢大佬! 晚上回去就改