Open sissbruecker opened 2 years ago
Hmm, with limit = -1
sometimes you don't get any result at all from the CDX API. For example:
http://web.archive.org/cdx/search/cdx?url=https://github.com/awslabs/aws-serverless-express&gzip=false&showResumeKey=true&limit=-1
returns an empty response.
However:
http://web.archive.org/cdx/search/cdx?url=https://github.com/awslabs/aws-serverless-express&gzip=false&showResumeKey=true&limit=-5
returns 5 entries.
The CDX API docs are not super clear, but that looks like a bug. A workaround could be to use a higher limit for newest
, and then only take the first result.
Describe the bug
Using
WaybackMachineCDXServerAPI.newest
does not return the last snapshot, but some recent snapshot. For example forhttps://openlayers.org/
, it returns a snapshot from2022-06-16 17:20:36
, the latest snapshot (as of today, September 10th 2022) is from2022-09-10 08:05:37
. There are around 380 snapshots between these two.I've debugged this a bit and it seems there is an issue either with how
sort
orlimit
are configured, or interpreted by the CDX server. The method setssort = 'closest'
andlimit = 1
. If I configure theWaybackMachineCDXServerAPI
instance manually and set tolimit = -1
instead, then I actually get the latest snapshot. https://github.com/akamhy/waybackpy/issues/155#issuecomment-1041882795 hints thatlimit = -1
should be used for the latest snapshot.To Reproduce
Workaround
Expected behavior The newest API should return the newest snapshot.
Version: