ajschumacher / gadsdc

materials for General Assembly Data Science DC course
81 stars 93 forks source link

split tf-idf from random forest bit #41

Closed ajschumacher closed 10 years ago

ajschumacher commented 10 years ago

probably these should be two different files?

ajschumacher commented 10 years ago

put it in logistic maybe?

ajschumacher commented 10 years ago

Looking at this now, I'm not convinced that it makes more sense in logistic. It's in pretty tight with the example used for doing forests; I don't think it's worth splitting out. Possibility for a future run of the course might be to pull all the text processing / NLP things (dummying, CountVectorizer, stemming, tf-idf, anything else - maybe LDA?) in an NLP-specific class.