akamhy / waybackpy

Wayback Machine API interface & a command-line tool
https://pypi.org/project/waybackpy/
MIT License
465 stars 34 forks source link

Why not implement newest, near and oldest for the CDX Server API as we have for the Availability API #155

Closed akamhy closed 2 years ago

akamhy commented 2 years ago

Is your feature request related to a problem? Please describe. Yes, the Availability API is not reliable when compared to the CDX server API. And Server usage has a steeper curve, and instead of telling users to implement these methods on their own using the interface, it would be cool to have these methods in the WaybackMachineCDXServerAPI class.

Describe the solution you'd like The following three methods should be inside the WaybackMachineCDXServerAPI class.

Describe alternatives you've considered N/A

Additional context See #154

akamhy commented 2 years ago
akamhy commented 2 years ago

see also https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/src/main/java/org/archive/cdxserver/CDXServer.java

akamhy commented 2 years ago

Near can be implemented by leveraging the to and from params.

waybackpy --url google.com --cdx --limit 1 --from 201010101010
waybackpy --url google.com --cdx --limit -1 --to 201010101010

Pick the closest one which has a better HTTP status code.

akamhy commented 2 years ago

Use https://github.com/internetarchive/wayback/issues/237#issuecomment-1042577291 for near.

akamhy commented 2 years ago

Implement near from https://web.archive.org/cdx/search/cdx?url=google.com&limit=1&closest=20101010101010&sort=closest&filter=statuscode:200 and oldest and newest should invoke near with appropriate args.