Closed zikolach closed 3 years ago
Thanks, @zikolach! Issue confirmed, caused by a partial failure of the status index. The crawler is still running but practically without discovering new articles anymore. I'll hope to get it fixed in a few hours.
The crawler is now back to normal and the first WARC file is uploaded. As expected, during the first two hours the crawler was mostly occupied fetching and parsing all the feeds and news sitemaps missed since Saturday 20:07 UTC when the status index failed. It's now running well and creating multiple WARC files per hour - to be uploaded soon. Thanks again, @zikolach!
@sebastian-nagel thanks a lot for quick response and fixing!
There seems to be only one file available for 2021-06-06 and nothing since then. Are there any changes related to news dataset?