ekalinin / robots.js

Parser for robots.txt for node.js
MIT License
66 stars 21 forks source link

Add a timeout handler #33

Open andymurd opened 3 years ago

andymurd commented 3 years ago

Some sites (yeah hello dfat.gov.au) will put a robot into a tarpit just for trying to download robots.txt. To escape, it's good to be able to supply a timeout when constructing a parser, like this:

parser = new robots.RobotsParser(false, { headers: { userAgent: "USER_AGENT" }, timeout: 30000 });

This PR adds a handler for timeout events, and treats them like errors.

Sorry, but I couldn't get the unit tests to run because expresso is so far out of date, but I have run the code successfully against several sites.