bellroy / predictionbook

Find out just how sure you should be, and get better at being only as sure as the facts justify.
http://predictionbook.com/
72 stars 11 forks source link

EasouSpider is sending weird content causing exceptions to be raised #59

Closed wezm closed 10 years ago

wezm commented 10 years ago
A ArgumentError occurred in predictions#show:

  invalid %-encoding (ò’`øX–µA8H4 û8⁄ÇN ∑„ÄáOui: 7flÏ    íFrF … 5Î ÀSm”ôñÇ’™ à m 8öKÏ „∑æˇ¥K
 xì› ´ñ5Öì<wÔÜ 'ê‹6‹Ñù€fl© ¡„pAS<Ïá_{QªQ“ã˚¬Ÿm %Xt®<Æ™·]fi   ΩñÉa<d|ôÏíc±5Ω±´¥
*   v%^ò≈åÿ^ í  ˜ (IÍ‘¬ª7 ©Ü# πî ·È5é ¨] è ç?W >ò∆g •-lO™(ÊÏi”Âүƈ )
  /usr/local/ruby/1.9.3-p484/lib/ruby/1.9.1/uri/common.rb:898:in `decode_www_form_component'

-------------------------------
Request:
-------------------------------

  * URL       : http://predictionbook.com/predictions/22213
  * Parameters: {"action"=>"show", "controller"=>"predictions", "id"=>"22213"}

-------------------------------
Environment:
-------------------------------

  * CONTENT_LENGTH                                 : 514
  * CONTENT_TYPE                                   : application/x-www-form-urlencoded
  * HTTP_ACCEPT                                    : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
  * HTTP_ACCEPT_ENCODING                           : gzip, deflate
  * HTTP_ACCEPT_LANGUAGE                           : zh;q=0.9,en;q=0.8
  * HTTP_CONNECTION                                : close
  * HTTP_HOST                                      : predictionbook.com
  * HTTP_REFER                                     : http://predictionbook.com/
  * HTTP_USER_AGENT                                : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
   * ORIGINAL_FULLPATH                              : /predictions/22213
  * PATH_INFO                                      : /predictions/22213
  * REQUEST_METHOD                                 : GET
  * REQUEST_URI                                    : /predictions/22213
  * SCRIPT_NAME                                    :
  * SCRIPT_URI                                     : http://predictionbook.com/predictions/22213
  * SCRIPT_URL                                     : /predictions/22213
RenWenshan commented 10 years ago

Seems to be Chinese search (Sou is search in Chinese, Ea stands for Easy I suspect) engine spider.

Anyway, according to the discussion here https://github.com/rack/rack/issues/337, it can be solved by adding a gem https://github.com/blambeau/rack-robustness

bf4 commented 10 years ago

I did something like this: https://gist.github.com/bf4/d26259acfa29f3b9882b#file-exception_app-rb

wezm commented 10 years ago

@bf4 after some research we decided to just block the spider at the Apache level instead of swallowing the exception. It's apparently known to behave badly and would not refer a meaningful amount of users anyway.