commoncrawl / news-crawl

News crawling with StormCrawler - stores content as WARC
Apache License 2.0
316 stars 34 forks source link

Full support for sitemap extensions and namespaces #25

Closed sebastian-nagel closed 5 years ago

sebastian-nagel commented 6 years ago

From sitemaps only news sitemaps are accepted as seed source. However,

Cf. crawler-commons/crawler-commons#162, crawler-commons/crawler-commons#174.