TeamHG-Memex / soft404

A classifier for detecting soft 404 pages
56 stars 14 forks source link

Building a classifier from scratch #12

Open lopuhin opened 7 years ago

lopuhin commented 7 years ago

The current training dataset is too big to put in a repo or host on s3 indefinitely. It was created with a crawler that is in the repo, but still it would be nice to have some way to re-train the classifier from scratch. See discussion in #3