To balance effectiveness against shipping buggy data (heuristic + bug label search results), ease of running crawler, ability to run crawler daily, size of seed database. We could always increase size of seed db later as we fix bugs (and multi-process/parallelize the crawler). Just want to start easier.
To balance effectiveness against shipping buggy data (heuristic + bug label search results), ease of running crawler, ability to run crawler daily, size of seed database. We could always increase size of seed db later as we fix bugs (and multi-process/parallelize the crawler). Just want to start easier.