Open sylvinus opened 8 years ago
http://commoncrawl.org/2016/10/news-dataset-available/
We should make sure it works with the current common crawl source
The CC news dataset currently has some formatting issues, and the team is fixing it: https://github.com/commoncrawl/news-crawl/issues/11
http://commoncrawl.org/2016/10/news-dataset-available/
We should make sure it works with the current common crawl source