Testing in Weka - Githubissues

CraigBryan / tweet-mood-analyzer

Assignment 2 for CSI 4107

0 stars 1 forks source link

Testing in Weka #8

Open BonShillings opened 9 years ago

BonShillings commented 9 years ago

So with the new features like q_marks, e_marks, pos_score, etc, I still haven't been able to outperform the baseline performance of 45%, which was achieved using J48 and the StringToWordVectorFilter. Using q_marks, etc matches this performance. Even combinations of STWV and the above features achieves around the baseline performance.

I'm really not sure of how to increase the quality of the results over the baseline. Hopefully, new features will help, but maybe it would be best to look into other Weka algorithms... I've tried SMO (SVM) J48 (decision tree) and Naive Bayes, and they have similar performance.

BonShillings commented 9 years ago

got 50% using BayesNet just now Used all features, and then wordToVector on the string itself finally some improvement

CraigBryan commented 9 years ago

I'm very close to getting the bag of words implementation working. It's just a matter of cleaning up tokens so they don't make weka cry.

On Tue, Mar 31, 2015 at 9:53 PM, BonShillings notifications@github.com wrote:

got 50% using BayesNet just now Used all features, and then wordToVector on the string itself finally some improvement

— Reply to this email directly or view it on GitHub https://github.com/CraigBryan/tweet-mood-analyzer/issues/8#issuecomment-88311129 .

CraigBryan commented 9 years ago

Ok. Just pushed the bag of words implementation. Try that out and see if it is worse or better.

BonShillings commented 9 years ago

Hey the BOW features were added with perfect format.

These is an issue with space however. I think some terms need to be trimmed. I think some Zipf distribution theory might help us. like we can trim the words that occur very infrequently. (i.e words that appear <= 3 documents, or something like this.

BonShillings commented 9 years ago

*I implemented this with the last push and was able to get 53% precission using naive bays complement