ShawnyXiao / TextClassification-Keras

Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.
MIT License
816 stars 187 forks source link

HAN's main.py suspicious reshape #14

Open ei-grad opened 3 years ago

ei-grad commented 3 years ago

IMDB dataset is tokenized by words, so after reshape in https://github.com/ShawnyXiao/TextClassification-Keras/blob/master/model/HAN/main.py#L20-L23 word-level dimension contain the whole words tokens, not chars/wordparts. Does it make sense at all? If it is used just for illustration, then maybe it worth to add a comment near it.