bellroy / lesswrong-migrated

Automatically exported from code.google.com/p/lesswrong
Other
0 stars 0 forks source link

Images in recent posts hosted on wiki.lesswrong.com are disallowed in robots.txt #357

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Recent posts Eliezer is making, such as 
http://lesswrong.com/lw/ezu/stuff_that_makes_stuff_happen/ have images in them 
that are hosted on wiki.lesswrong.com, for example 
http://wiki.lesswrong.com/mediawiki/images/6/6b/Markov.svg

The robot.txt file of wiki.lesswrong.com disallows copying images from 
mediawiki folder:
Disallow: /mediawiki/

As a result, archivers such as the Wayback Machine can't archive or display the 
images in the posts, for example the snapshot of the above page is missing most 
of its images:
http://liveweb.archive.org/http://lesswrong.com/lw/ezu/stuff_that_makes_stuff_ha
ppen/

I think this restriction should be lifted.

Original issue reported on code.google.com by robot...@gmail.com on 18 Oct 2012 at 11:35

GoogleCodeExporter commented 9 years ago
Can we get an estimate on this?

Original comment by lukep...@gmail.com on 13 Dec 2013 at 11:09

GoogleCodeExporter commented 9 years ago

Original comment by wjmo...@gmail.com on 15 Dec 2013 at 10:14

GoogleCodeExporter commented 9 years ago
It seems to work now (or worked a few days ago), with some delay and error 
messages. For example, when I made the snapshot 
http://web.archive.org/web/20131213072752/http://lesswrong.com/lw/eqn , the 
image http://wiki.lesswrong.com/mediawiki/images/3/30/E.jpg failed to display, 
and complained about robots.txt when opened from the snapshot on its own.

Saving that image alone via url 
http://web.archive.org/save/http://wiki.lesswrong.com/mediawiki/images/3/30/E.jp
g also doesn't work (it reliably says 'Page cannot be crawled or displayed due 
to robots.txt'). It's not a glitch for all images, for example 
http://web.archive.org/save/http://obstacol.com/wp-content/uploads/2012/06/3-or-
4.jpg does work.

However, a few days later that image got saved somehow, so that it's displayed 
in the snapshot 
http://web.archive.org/web/20131213072752/http://lesswrong.com/lw/eqn as 
http://web.archive.org/web/20131213072805im_/http://wiki.lesswrong.com/mediawiki
/images/3/30/E.jpg . Not sure what's going on here.

Original comment by robot...@gmail.com on 16 Dec 2013 at 5:21

GoogleCodeExporter commented 9 years ago
The robots.txt explicitly allows content under /mediawiki/images/ now. So going 
to close this ticket.

Original comment by wjmo...@gmail.com on 30 Jan 2014 at 5:22