HAN's main.py suspicious reshape

ShawnyXiao / TextClassification-Keras

Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.

MIT License

816 stars 187 forks source link

Open ei-grad opened 3 years ago

ei-grad commented 3 years ago

IMDB dataset is tokenized by words, so after reshape in https://github.com/ShawnyXiao/TextClassification-Keras/blob/master/model/HAN/main.py#L20-L23 word-level dimension contain the whole words tokens, not chars/wordparts. Does it make sense at all? If it is used just for illustration, then maybe it worth to add a comment near it.