commoncrawl / news-crawl

News crawling with StormCrawler - stores content as WARC
Apache License 2.0
316 stars 34 forks source link

News archive is not available since 2023-10-23 15:36:50 #57

Closed zikolach closed 10 months ago

zikolach commented 10 months ago

Since 2023-10-23 15:36:50 there was no new news dataset warc files listed in https://data.commoncrawl.org/crawl-data/CC-NEWS/2023/10/warc.paths.gz

curl -s -o - https://data.commoncrawl.org/crawl-data/CC-NEWS/2023/10/warc.paths.gz | gzip --decompress | tail -n 1
crawl-data/CC-NEWS/2023/10/CC-NEWS-20231023153650-02160.warc.gz

Could you please help? Is there something bad happened as last time or did I miss any announcement?

Thanks in advance!

jnioche commented 10 months ago

Thanks @zikolach. See answer on user list