borowiak / pwa-technologies

Automatically exported from code.google.com/p/pwa-technologies
0 stars 0 forks source link

Archived pages try to load resources from live Web before archived resources #38

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When a page is loaded, the original contents are loaded from the original 
servers (for instance, images) and only then the javascript functions rewrite 
the links to display the archived embedded contents.
This originates a load of calls to the servers where the contents were 
originally located, producing a burst of 404 and 3XX Status Codes in the 
original hosts.

This may:
- present additional load to the servers by doing a GET request for every 
resource in the page that may be long gone. For big pages and popular pages, 
this could to a huge number of HTTP GET request of old/inexistent resources.
- lead to a flash of still existing content that wasn't retrieved by the Web 
Archive, confusing the users.
- if there are newer version of resources (e.g., images) for a given URL, a 
newer version of the resource can flash before being replaced by the archived 
older version.

I attached a report from WebPageTest that details the requests done for the 
archived version of 20 jan 2011 from www.fccn.pt 
(http://arquivo.pt/wayback/wayback/20110120234828/http://fccn.pt/index.php?modul
e=pagemaster&PAGE_user_op=view_page&PAGE_id=430&MMN_position=230:4:229

Example of file to look at: "b-on.gif":
1) GET b-on.gif in live Web / Got 302
2) GET b-on.gif in live Web / Got 404
3) GET b-on.gif in Web Archive / Got 302
4) GET b-on.gif in Web ARchive / Got 200

Original issue reported on code.google.com by whisp...@gmail.com on 30 Jul 2012 at 4:00

Attachments:

GoogleCodeExporter commented 9 years ago
This also impacts the loading time of pages.

- Additional DNS queries are made.
- If content such as images, scripts, still exist, they are loaded and then, 
after the JS rewrite their path to point to the archive, they are again 
requested and reloaded. 

Original comment by whisp...@gmail.com on 30 Jul 2012 at 4:04

GoogleCodeExporter commented 9 years ago
A possible solution could be to defer the loading of referenced resources 
(images, css, js, ...) until the paths are rewritten by the Web archive.

Original comment by whisp...@gmail.com on 30 Jul 2012 at 4:06

GoogleCodeExporter commented 9 years ago

Original comment by whisp...@gmail.com on 30 Jul 2012 at 5:59