SvenAG / SNLP-Final-Project

SNLP Final Project
0 stars 0 forks source link

HTML Frequency (logistic regression per tag) #5

Open rob-nyu opened 10 years ago

rob-nyu commented 10 years ago

Train a logistic regression model FOR EACH TAG on the tfidf feature matrix built by looking at the words FOR EACH TAG.

Using these models, try predicting for each entry. This predicted score is the probability of being an evergreen.

Use this probability score from each model as a new feature. We will eventually have a matrix with as many rows as we have webpages and as many columns as we have tags. The entry in each cell will be the probability given by the respective model.

Train a linear regression model on this new feature space and see the weights for each tag and how it does.