borowiak / pwa-technologies

Automatically exported from code.google.com/p/pwa-technologies
0 stars 0 forks source link

Several different versions of an URL point to the same archived version. #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Search for "www.dn.pt"
2. Open the first version of 2004: 9 Feb 
(http://arquivo.pt/wayback/wayback/id35066index0)
3. The archived page of 14 Feb 2005 is shown.
4. Go back.
5. Open the version 31 Mar 2004 
(http://arquivo.pt/wayback/wayback/id36964index0)
6. The archived page of 14 Feb 2005 is shown.

The same behavior can be observed for every version up until 2006.
From 2006 to 2008, the same behavior happens with another archived version 
(http://arquivo.pt/wayback/wayback/20080212134236/http://dn.sapo.pt/).

Other archived websites present this behavior, e.g. "www.expresso.pt".

Original issue reported on code.google.com by devel.da...@vcruz.net on 3 Feb 2012 at 2:51

GoogleCodeExporter commented 9 years ago
Here's another example:

http://arquivo.pt/search.jsp?l=pt&query=ipb.pt&btnSubmit=Pesquisar+no+Arquivo

6 Jun 2010 links to:  17 Dezembro, 2009 
26 set 2009 links to:  17 Dezembro, 2009 

There are also a lot of broken results:

6 Jun 2010 links to:  Resource Not In Archive
6 Jun 2003 links to:  Resource Not In Archive
22 Jun 2003 links to:  Resource Not In Archive
21 Mai 2005 links to:  Resource Not In Archive
10 Fev 2005 links to:  Resource Not In Archive

Original comment by danielco...@gmail.com on 13 Feb 2012 at 6:25

GoogleCodeExporter commented 9 years ago
Still another one.

Searching for 
mega.ist.utl.pt/~sinfo

http://arquivo.pt/search.jsp?l=pt&query=mega.ist.utl.pt%2F~sinfo&btnSubmit=Pesqu
isar+no+Arquivo

All versions from 2003 point to a version from 19 Oct, 2002.
Versions from 2002 up to 11 May point to a version from 02 May, 2001.
Versions from 2004 up to 19 March point to the same version from 02 May, 2001. 
Same for dates 23 May, 1 Sep, 24 Sep and 11 Dec, that point to 02 May, 2001.
It also happens for dates from 2005 and 2006, that still point to 02 May, 2001. 
I didn't check them all, but it happens, at least, with the versions from 31 
Dec, 2005 and 13 Aug, 2006.

Original comment by joaocarv...@gmail.com on 2 May 2012 at 4:32

GoogleCodeExporter commented 9 years ago
This issue seems to be more frequent than initially thought.
Changing to critical due to the fact that this issue misleads users and prevent 
the access to the relevant archived versions.

Original comment by devel.da...@vcruz.net on 14 May 2012 at 5:19

GoogleCodeExporter commented 9 years ago
The same error occur for the query: www.worten.pt

http://arquivo.pt/search.jsp?query=http%3A%2F%2Fwww.worten.pt%2F&dateStart=01/01
/1996&dateEnd=31/12/2011&pos=1

Version of 2006 redirect to versions of 2002

Original comment by migco...@gmail.com on 28 May 2012 at 9:41

GoogleCodeExporter commented 9 years ago
After further investigation, the trigger is redirection of archived pages. When 
they were crawled the redirect information was preserved. However, when 
replaying the archived page, they silently re-apply the redirection.

The big issue here is: why the redirection points to a page much older or much 
newer when a temporally closer version is available?

Example:
1. search for: www.dn.pt
2. click the version from "9 Fev 2004" 
(arquivo.pt/wayback/wayback/id35066index0)
3. the browser is redirect to the page "http://dn.sapo.pt/" from "14 Fev 2005" 
(http://arquivo.pt/wayback/wayback/id136index2)
- For "http://dn.sapo.pt/" there is a version of "9 Fev 2004" but that's not 
were the browser is being redirected to.

Example2:
1. search for: www.worten.pt
2. click the version from "11 April 2006" 
(arquivo.pt/wayback/wayback/id15068247index0)
3. the browser is redirect to the page "http://www.worten.pt/home.html" from 2 
Aug 2002 (arquivo.pt/wayback/wayback/id39948470index0)
- For "http://www.worten.pt/home.html" the date closest to "11 April 2006" are 
"26 Jan 2006" or "12 April 2006" but the version of "2 Aug 2002" is opened.

Original comment by devel.da...@vcruz.net on 28 May 2012 at 5:01

GoogleCodeExporter commented 9 years ago
More examples of pages with this issue.

page from RTP:
http://arquivo.pt/search.jsp?query=http%3A%2F%2Fwww.rtp.pt%2F&dateStart=01/01/19
96&dateEnd=31/12/2010&pos=1

- clicked: 8 feb 1999 → archived version displayed: 28 jan 2000
- clicked: 31 mar 2001 → archived version displayed: 28 jan 2000
- clicked: 24 jan 2002 → archived version displayed: 10 fev 2002

Original comment by devel.da...@vcruz.net on 31 Jul 2012 at 3:40