Rhizome-Conifer / conifer

Collect and revisit web pages.
https://conifer.rhizome.org
Apache License 2.0
1.48k stars 118 forks source link

Following links in extraction mode leads to live web #620

Open patshiu opened 5 years ago

patshiu commented 5 years ago

Problem: image

When extracting a page from a public archive, following a link should take users to the closest available recording of the target page in public archives. However, currently, following links more than 1 degree of separation from the original target of extraction invariably leads to the live web page instead.

The specific links I used

LEFT: Webrecorder in Extraction Mode | RIGHT: Following same links on Internet Archive

Original target of extraction: https://web.archive.org/web/20030219054634/http://www.salon.com:80/tech/feature/2002/03/01/netochka/index1.html image

First link clicked — European Net arts list Syndicate: https://web.archive.org/web/20030221061935/http://anart.no:80/~syndicate/ image

Second link clicked — do birds [shakeZkknut] ? : https://web.archive.org/web/20030405210402/http://anart.no:80/~syndicate/KKnut/index.html image On the left is the live page, even though the extraction widget still says "2003".

ikreymer commented 5 years ago

This has to do with the slash redirect not being propagated, the IA page redirects from http://anart.no:80/~syndicate -> http://anart.no:80/~syndicate/ but that is not reflected in the extraction.

Thus, it tries to extract http://anart.no:80/KKnut/index.html which is not archived anywhere, instead of http://anart.no:80/~syndicate/KKnut/index.html resulting in a live 404.