cocrawler / cdx_toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Apache License 2.0
158 stars 31 forks source link

404 seen for API call #19

Closed yujianll closed 3 years ago

yujianll commented 3 years ago

Hi,

I'm getting this error ValueError: 404 seen for API call, did you configure the endpoint correctly? for the following code:

cdx = cdx_toolkit.CDXFetcher(source='cc')
url = "https://www.cnn.com/*"
objs = list(cdx.iter(url, from_ts='202001', to='202002', filter=['status:200']))

The same code worked two days ago but fails now, do you know why would this happen? Thanks!

wumpus commented 3 years ago

This bug was caused by CommonCrawl updating to pywb 2.5 for their server, it had a slight API change.

Please install the new 0.9.31 version of cdx_toolkit for a fix.