TromboneDavies / PolarOps

0 stars 0 forks source link

Identify possibly useful non-bigram features #44

Open divilian opened 3 years ago

divilian commented 3 years ago

To be clear, this issue is: "add features from #31 to the classifier in a quest to improve its performance, and modify the classifier in other useful ways."

  1. @TromboneDavies: make "number of comments" be "avg length of comment" (complete 6/25)
  2. @akochans: For the four initial TJ features (avg length of word, number of comments, frequency of links, freq of in-thread quotes) compare polarized vs non-polarized for those features (before even including them in the classifier). (First cut is just compare means, second cut is to compare histograms/KDEs.)
  3. @rockladyeagles: figure out the "right" way to incorporate features like "avg word length" with a bunch of "does it contain trump" features. (now #49)
divilian commented 3 years ago

Regarding point 3, above, this should be partially related to section 23.2 of Crystal Ball vol 2 (z-scoring features on diff scales).