Only potential downside of this change is that we are no longer only getting the politics category. We were previously only crawling ~60 articles, but now crawler running indefinitely, I stopped at 485:
2019-08-06 11:04:29 INFO: Processed 485 pages in 0:02:33.039399 => 3.17 Hz
2019-08-06 11:04:29 INFO: Found articles in 485/485 pages => 100.00%
2019-08-06 11:04:29 INFO: ... of these 0/485 had no date => 0.00%
2019-08-06 11:04:29 INFO: ... of these 290/485 had no byline => 59.79%
2019-08-06 11:04:29 INFO: ... of these 0/485 had no title => 0.00%
2019-08-06 11:04:29 INFO: Including skipped pages, there are articles in 485/485 pages => 100.00%
Closes #198
Only potential downside of this change is that we are no longer only getting the politics category. We were previously only crawling ~60 articles, but now crawler running indefinitely, I stopped at 485:
I've checked and many articles have no bylines