adam-broussard / bestreads

2 stars 1 forks source link

Selecting only well-measured genres and adding validation #41

Closed adam-broussard closed 2 years ago

adam-broussard commented 2 years ago

Validation will need more work. At the moment, an average of about 8 seconds per description are needed when querying (after the paring down of genres also performed in this PR). At the moment, we are getting something like 62% prediction accuracy within the top 5 voted book genres. This is probably simultaneously a bit overly optimistic (because some books don't really have much more than 5 voted genres, but it is still able to identify at least one of them) and under-representative (because there are still genres that are treated as distinct that likely aren't, e.g., "Fantasy" vs. "Fiction").

The particular values of required numbers of books and required numbers of votes to allow a genre through can be tuned, but 25 seems to work well off the bat (partially based on @youngant's suggestion in #37).

Closes #37 and #39, does some work on #32 and #23.

youngant commented 2 years ago

Not merging yet just so you see my above comment.