Using BiLSTM-CNN-CRF on Twitter corpora

roskoN commented 7 years ago

Hello,

First of all, thank you very much for making your work open-source to the public!

I am trying to reuse the code to apply it for Twitter NER as described in this paper.

I haven't added yet the orthographic features described in the paper. However, running some tests proves to be unsuccessfull. Training seems to achieve some (probably local) minimum of the loss function early within the second iteration. However when tested on dev and test sets, it always achieves the same accuracy score, because it always tags words with O. I am using BIO encoding with 10 different NER types, which gives 21 possible labels in total.

Is the problem related to entering this minimum? Or is it just too soon to expect any reasonable results? Can you offer some advice on the problem, please?

Here is my python notebook. In the very bottom, there is output from training.

Thank you!

Kind Regards, Rosko

XuezheMax commented 7 years ago

Hello,

Could you provide more details, like the parameters you were using?

On Wed, Feb 22, 2017 at 7:13 AM, roskoN notifications@github.com wrote:

Hello,

First of all, thank you very much for making your work open-source to the public!

I am trying to reuse the code to apply it for Twitter NER as described in this paper http://noisy-text.github.io/2016/pdf/WNUT20.pdf.

I haven't added yet the orthographic features described in the paper. However, running some tests proves to be unsuccessfull. Training seems to achieve some (probably local) minimum of the loss function early within the second iteration. However when tested on dev and test sets, it always achieves the same accuracy score, because it always tags words with O. I am using BIO encoding with 10 different NER types, which gives 21 possible labels in total.

Is the problem related to entering this minimum? Or is it just too soon to expect any reasonable results? Can you offer some advice on the problem, please?

Here is my python notebook https://gist.github.com/roskoN/7f78c574fc56c11f8423c187afaafe68. In the very bottom, there is output from training.

Thank you!

Kind Regards, Rosko

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/LasagneNLP/issues/7, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtljA7hWnCfKsi2BPv7FpViIsbN1SZks5rfCZZgaJpZM4MIkUG .

--

Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

roskoN commented 7 years ago

Hello,

Thank you for answering!

Here are the parameters:

update algorithm: AdaDelta regularization: L2 gamma: 1e-6 dropout: True Gradient Clipping: 5.0 Peepholes: False Number of LSTM units: 100 Number of Conv. filters: 300 Char. embedd. dim.: 30 Word embedd. dim.: 400 pre-trained word embedd: http://www.fredericgodin.com/software/ batch size: 50 number of sentences: 2394 train-dev-test split: 80%-10%-10%

XuezheMax commented 7 years ago

my suggestion is first train a small model, for example you can decrease the number of Conv filter to 30. and Word embedding dim to 100. Then, use a small batch size like 10, because your training data is pretty small. Let me know if it works, thanks.

On Wed, Feb 22, 2017 at 1:00 PM, roskoN notifications@github.com wrote:

Hello,

Thank you for answering!

Here are the parameters:

update algorithm: AdaDelta regularization: L2 gamma: 1e-6 dropout: True Gradient Clipping: 5.0 Peepholes: False Number of LSTM units: 100 Number of Conv. filters: 300 Char. embedd. dim.: 30 Word embedd. dim.: 400 pre-trained word embedd: http://www.fredericgodin.com/software/ batch size: 50 number of sentences: 2394 train-dev-test split: 80%-10%-10%

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/LasagneNLP/issues/7#issuecomment-281749784, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtlvpbTqBZ0YqHiBO7tz1AmxlStzIIks5rfHevgaJpZM4MIkUG .

--

Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

roskoN commented 7 years ago

Hey,

thank you for your help! It really was useful. After a bit longer training, a new minimum was achieved, and the classifier started giving better results.

Thank you very much for everything!

XuezheMax / LasagneNLP

Using BiLSTM-CNN-CRF on Twitter corpora #7

--

--