Closed prhbrt closed 9 years ago
True. A lot of this could be simplified now actually.
I'm currently reading about convolutional neural networks to do similar text-classification, in my case also insult detection. The fun thing about this is that the convolutional layer is capable of learning correlations between words, and hence is more likely to be able to recognize negations. On the other hand, I'm afraid of overfitting when there's a limited amount of sentences to train on.
Maybe also try LSTMs on word-level? Are you doing character or word-level CNNs?
Maybe also try LSTMs on word-level?
I'd have to look into that, "Long Short Term Memory" is something relatively new for me, but something my coworkers should have experience with. Thanks!
Are you doing character or word-level CNNs?
Both, it's called charSCNN, where first a convolutional layer detects local correlations on a character-level, and layer on another convolutional layer detects local correlations on a word level (using a combined/concatenated input of a feature vector for the word and the output of the first convolutional layer). Here's the paper: http://www.aclweb.org/anthology/C14-1008
Ah, I haven't seen that one.
A 'third-party' (i.e. not author) implemented the pipeline here: https://github.com/satwantrana/CharSCNN
They stated 70% accuracy, which isn't too fancy I guess.
well depends on the dataset ;)
Or possibly a method that is prone to overfitting or underfitting :) But of course, there's a 100% accuracy dataset for each classifier :P
I would think auc_score was renamed to roc_auc_score, but couldn't find any proof of it.