fb55 / readabilitySAX

a fast and platform independent readability port (JS)
BSD 2-Clause "Simplified" License
244 stars 36 forks source link

Issue with wikipedia link #28

Closed colin-jack closed 12 years ago

colin-jack commented 12 years ago

The link in questions is :

http://en.wikipedia.org/wiki/Concern_troll#Concern_troll

The content retrieved is "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice."

colin-jack commented 12 years ago

With http://d.pr/i/WzsI you get a similar response of

"Go away evil bot! Cause: 'No User-agent header' If you're not a bot, please accept my apologies and get in touch with the Droplr team so they can fix this. - The Droplr anti-bot Marshal"

fb55 commented 12 years ago

I need to rewrite the node parts of readabilitySAX. Adding a user-agent header is on my todo-list.

colin-jack commented 12 years ago

Dunno if its related but 403 for a link like http://tools.ietf.org/html/draft-sinnema-xacml-media-type-00