Sotera / webpageclassifier

Categorizes a website given URL into one of blog|wiki|news|forum|classified|shopping|undecided.
Apache License 2.0
8 stars 3 forks source link

Include errors in results #12

Closed ctwardy closed 7 years ago

ctwardy commented 7 years ago

Current code removes HTTP errors from results. Don't. Our main use case now involves cached HTML from SiteHound -- if that's bad we want to classify it as an error page.

Hence: include rules for labeling "error".

ctwardy commented 7 years ago

Fixed in #13. But not fully integrated.