TromboneDavies / PolarOps

0 stars 0 forks source link

Figure out how to combine features #49

Open divilian opened 3 years ago

divilian commented 3 years ago

Figure out the "right" way to incorporate features like "avg word length" with a bunch of "does it contain trump" features. (From #44)

divilian commented 2 years ago

An email exchange with Dave on this:

Dave replies:

Those are the right answers -- assuming the "magic of neural networks". In practice, however, some measure of "feature selection" or "dimensionality reduction" ought to be performed before giving the data to the neural network, because you don't want the noise in the data to overwhelm the signal, but since neural networks are magic you needn't worry about them.


Stephen asked:

The question is: "what if we have a ton of features which are (say) word counts, but then we also have some other 'meta-features' - such as text length, lexical diversity, number of exclamation points or emojis, etc. - that we suspect may help with classification and so we want to mix into the soup. How do we add three or four features to the thousands of word count features and not have the significance of those three or four overwhelmed by the sheer number of word count features?"

I think your answer was some form of: "don't worry about it; to the extent that those three or four features truly are indicative of label, the neural net will figure that out without you having to do anything special. Just stick 'em in there with everything else."

Another part of your answer might (or might not) have been: "oh, but make sure to scale these meta-features so they're on the same scale as the many word features."