JacksonDavenport / RedditPrediction

Use a script to collect the titles from select/specified subreddits and use this collection to learn and predict the likelihood of a given title. This is used to predict which subreddit a chosen title should belong too.
1 stars 0 forks source link

Add a 'weight' field for Distribution and populate it with the best weighting for Unigram and Digram values per subreddit #13

Open JacksonDavenport opened 7 years ago

JacksonDavenport commented 7 years ago

-Add the weighting field and call it for the TitleModeling run instead of running through many -When performing CreateDistributions when you are doing creating the distributions themselves test all of the titles for that subreddit based on the distribution and calculate the best weight per title. -Take the best weight per title and use the average or some other form of all these weights and that will be the weighting value for that subreddit

Summary: /r/nba will have a weighting of .65 showing a stronger correlation between consecutive words than /r/funny with a .43, the modeling should both reflect this and take advantage of it for computation.