akamhy / waybackpy

Wayback Machine API interface & a command-line tool
https://pypi.org/project/waybackpy/
MIT License
453 stars 32 forks source link

"No archive URL found in the API response" errors #82

Closed dequeued0 closed 2 years ago

dequeued0 commented 3 years ago

I seem to get this error a lot of the time when saving the archive actually succeeded (rechecking later finds it). It seems like the error statement could be more specific about the failure and I'm not sure that the upgrade suggestion is helpful when the version is current.

No archive URL found in the API response. If 'https://old.reddit.com/<some post on Reddit here>' can be accessed via your web browser then either this version of waybackpy (2.4.0) is out of date or WayBack Machine is malfunctioning. Visit 'https://github.com/akamhy/waybackpy' for the latest version of waybackpy.
Header:
{'Server': 'nginx/1.15.8', 'Date': 'Thu, 14 Jan 2021 22:13:08 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '232', 'Connection': 'keep-alive', 'X-App-Server': 'wwwb-app52', 'X-ts': '404', 'X-Tr': '138505', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': 'Google'}
akamhy commented 3 years ago
akamhy commented 3 years ago

Header: {'Server': 'nginx/1.15.8', 'Date': 'Thu, 14 Jan 2021 22:13:08 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '232', 'Connection': 'keep-alive', 'X-App-Server': 'wwwb-app52', 'X-ts': '404', 'X-Tr': '138505', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': 'Google'}

I don't know what all the keys represent in the header of failed request, I will try contacting internet archive for help.

rechecking later finds it

I'm gonna try fetching the newest archive before raising error, if difference in timestamp is less than 30 minutes will return the newest archive. According to IA, Wayback machine doesn't allow more than 1 archive per 30 minutes.

dequeued0 commented 3 years ago

I'm gonna try fetching the newest archive before raising error, if difference in timestamp is less than 30 minutes will return the newest archive. According to IA, Wayback machine doesn't allow more than 1 archive per 30 minutes.

Sounds good. Perhaps a flag should be set on the archive object indicating that the archive is older than the save request?

Also note that the URLs I am archiving are very new so there is no previous archive.

akamhy commented 3 years ago

The flag is cached_save, if it is True then the archive was cached by wayback machine.

Use -

>>> import waybackpy

>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

>>> wayback = waybackpy.Url(url, user_agent)

>>> archive = wayback.save()
>>> archive.cached_save
True

True indicates cached archive.