Closed Medstaar closed 2 years ago
That's intentional. Common Crawl is having an outage. You'll note that after 5 cycles it prints a visible-by-default warning. I added that after a previous complaint that a short commoncrawl outage crashed their long-running job.
I have notified the Common Crawl engineer, he works German hours so might not fix it for 12 hours.
Once CC stops being overloaded, I've added code that makes it clear that the 500 status results are actually 503 "slow down" from Amazon for the actual files of the index.
Thanks for the clarification @wumpus, didn't realize there was an outage. I will close the ticket and try again to see if it's working okay
OS: Windows 10 Python 3.8.5 cdx_toolkit 0.9.34
I'm having an issue where the CDX toolkit get's stuck in a loop and prints out
cdx_toolkit.myrequests myrequests.py: 62 : retrying after 1s for 500
constantly. I've tracked this down to this line in the myrequests.py class. If I am reading this correctly, if the response status is always one of429, 500, 502, 503, 504, 509
, you will be stuck in this retry loop.I suggest that after line 62 we break out of the loop if the number of retires is greater than 5.