dessant / web-archives

Browser extension for viewing archived and cached versions of web pages, available for Chrome, Edge and Safari
https://apps.apple.com/us/app/web-archives-for-safari/id1603181853
GNU General Public License v3.0
1.22k stars 91 forks source link

Improve Wayback Machine snapshot selection #51

Open dessant opened 2 years ago

dessant commented 2 years ago

I'll quote a review here because it contains useful information.

https://addons.mozilla.org/en-US/firefox/addon/view-page-archive/reviews/1836542/

It works to some extent, I like it's ability to open all the different web archives with one click.

There is a bit of an issue with some archives. For URLs: When I click on an URL to a Microsoft.com out-dated page, the Wayback Machine will take me to Microsoft's Error404 landing page.

This URL for example:
https://www.microsoft.com/en-us/download/details.aspx?id=45885

When passed to this add-on, the Wayback Machine converts it to this page:
https://web.archive.org/web/20220328035922/https://www.microsoft.com/en-us/download/404Error.aspx

*It's landing on a Wayback Redirect page. After 5 seconds, the page gets redirected to another page.

We can see in the date is /2022-03-28-03:59:22/ and this is one of the newest snapshots created by the Archive. It's unfortunate, but The Wayback Machine continues to create snapshots of these 404 pages.

So someone might say, why don't you just use the Wayback Date-toolbar to turn back to an older date? The problem is, since your tool is finding the newest snapshots, it's returning these 404 pages. This changes the URL that we're searching for.

The API docs for the Wayback Machine says "timestamp is the timestamp to look up in Wayback. If not specified, the most recenty available capture in Wayback is returned."

The correct way to use the API is to create a link like this:
http://archive.org/wayback/available?url=https://www.microsoft.com/en-us/download/details.aspx?id=45885&timestamp=20010101
*This will return a .json that contains a working "closest snapshot" URL and you can click on it.

It appears that this add-on is not using the API but is trying to manipulate URLs instead. This wont work well.

If you add the "&timestamp=20010101" key, it will enable the "Return closest snapshot to the date 2001-01-01" rather than return the newest available snapshot. The downside is, you'll need to write something that will handle the .json API return data. (which shouldn't be very hard)

Doing it that way will ALWAYS return a website. Not those Error404 landing pages.
nyanpasu64 commented 2 years ago

Can you redirect to * (a date picker) rather than a particular date?

dessant commented 1 year ago

@nyanpasu64, that is already possible using the Wayback Machine (all) engine, visit the extension's options to enable it.