TeamHG-Memex / sitehound-frontend

Site Hound (previously THH) is a Domain Discovery Tool
Apache License 2.0
23 stars 13 forks source link

Add Known URLs: allow incoming to be "Neutral" (vs. default "Relevant") #5

Open ctwardy opened 7 years ago

ctwardy commented 7 years ago

[Discussed on Slack #pagetype ~3 weeks ago. I dad accidentally posted this to thh-classifiers.]

In "Add Known URLs", the user can supply a line-separated list of ostensibly-known URLs. Currently they come in as "Relevant". I'd like these pages to come in tagged as "Neutral", and then review.

In my use case, I am using SiteHound to find the relevant ones. I supply hundreds of likely, but unverified URLs from past crawls. Many were once relevant, but are now 404 or expired domains. Some were simply false positives. Starting "Neutral" makes it easy to find the pages not yet sorted into "Relevant" and "Irrelevant".

That will be esp. important if I later add a new batch of pages to review: coming in "Neutral" allows users easily to find and tag them.

(Similarly the option could be extended to user-defined labels, though in my case coming in unlabeled is just right.)