brightmart / text_classification

all kinds of text classification models and more with deep learning
MIT License
7.83k stars 2.57k forks source link

Is it possible to implement Hierarchical Attention Network with parsing real sentences? #58

Closed acadTags closed 6 years ago

acadTags commented 6 years ago

Thank you a lot for your sharing.

I find that in your implementation of Hierarchical Attention Network (HAN), the sentences are separated through setting an equal sentence length. This is however not the true sentence length in the data.

I wonder if it is easy to change this to using a sentence parser to find the sentences? How would be the difference in performance?

Please kindly let me know if you have any idea on parsing the real sentences based on your HAN code. Many thanks!

brightmart commented 6 years ago

Hi,

Good to hear from you.

I think it is a good to send real sentence to the HAN network.

for example, if total length of a document is 400. now it is split as 10 sentences, each with 40 words.

so you can prepare 10 sentence spaces, the max length of each sentence is 40.

any word exceed of 40 will be truncated, pad where necessary.

by doing so in data processing step, you can send real sentence to the network,

I think it may improve performance.

Bright


发件人: acadTags notifications@github.com 发送时间: 2018年5月31日 15:51 收件人: brightmart/text_classification 抄送: Subscribed 主题: [brightmart/text_classification] Is it possible to implement Hierarchical Attention Network with parsing real sentences? (#58)

Thank you a lot for your sharing.

I find that in your implementation of Hierarchical Attention Network (HAN), the sentences are separated through setting an equal sentence length. This is however not the true sentence length in the data.

I wonder if it is easy to change this to using a sentence parser to find the sentences? How would be the difference in performance?

Please kindly let me know if you have any idea of how to implement parsing the real sentences based on your HAN code. Many thanks!

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbrightmart%2Ftext_classification%2Fissues%2F58&data=02%7C01%7C%7C20e0b2f1b34b4595e3c408d5c6cb40f8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636633498628761980&sdata=vM2OnW52KdKx%2FR0BmmKNftnSdYiXF0ijdYClf%2FCgWg4%3D&reserved=0, or mute the threadhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASuYMOnsIrtgHWcXQeK1v0WAryXMEA4Gks5t36DkgaJpZM4UUlCD&data=02%7C01%7C%7C20e0b2f1b34b4595e3c408d5c6cb40f8%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636633498628761980&sdata=gLUKLUGzLS0Uz%2BUhfG3mdT1S2WhOpjoIcen5QQ6IJOw%3D&reserved=0.

acadTags commented 6 years ago

Hi Bright,

Thank you. I find using real sentences sometimes can boost the results by about 1%, but sometimes even slightly lower than using fake sentences. Anyway, very little difference is found.

Best wishes, acadTags