ekalinin / robots.js

Parser for robots.txt for node.js
MIT License
66 stars 21 forks source link

Two bug fixes #4

Closed mlodz closed 12 years ago

mlodz commented 12 years ago

Hi ekalinin,

We discovered a bug where the callback in the RobotsParser constructor gets called multiple times, due to large robots.txt files being returned through the request object in multiple chunks. The solution was to alter the response's data listener to collect the chunks, and add the response's end listener to combine the chunks and then call the callback.

We also found a bug which crashes the parser if a rule in robots.txt is malformed (if it has an unquoted %). And we handle it by simply ignoring the rule (rather than guessing what it is supposed to match).

Both of these problems were caused by Wikipedia's robots.txt file, if you are interested in replicating the bugs.

Thanks, Steve

ekalinin commented 12 years ago

Hi,

Great patch. Applied. Thaks.