AlexGidiotis / Document-Classifier-LSTM

A bidirectional LSTM with attention for multiclass/multilabel text classification.
MIT License
171 stars 52 forks source link

Regarding HAN implementation #4

Closed spartian closed 5 years ago

spartian commented 5 years ago

The HAN paper considers only words that appears more than 5 times. I don't think it is implemented in the code. Also, does stop word removal take place in the paper? As I mentioned if stop words are repeated 5 or more times then event they have to be considered. What are your views on this?

AlexGidiotis commented 5 years ago

You are welcome to try both those modifications and let us know if they actually improved the performance.

spartian commented 5 years ago

One more question before closing the issue: Has word attention been implemented in HAN ? I asked this because I can see two BILSTM layers in HAN but first a sentence BILSTM is used then word BILSTM is used. Shouldn't it be reverse? or am I missing something?

AlexGidiotis commented 5 years ago

That's just me naming the layers in a weird way. The first "sent_blstm" run on a word level and encodes each sentence and then "blstm" runs on a sentence level and encodes the whole document.

spartian commented 5 years ago

Thank you for clearing that out......