amoilanen / js-crawler

Web crawler for Node.JS
MIT License
253 stars 55 forks source link

A small change to omit comments in html body #20

Closed tibetty closed 8 years ago

tibetty commented 8 years ago

Reason: comments might mislead following link parsing In function: Crawler.prototype._getAllUrls = function(baseUrl, body) body = body.replace(//g, ''); // Added by @tibetty to omit commented contents which might mislead following link-parsing

amoilanen commented 8 years ago

Thank you, seems reasonable, as commented out links should be ignored by a browser. I will merge it now and a bit later publish a newer version of js-crawler. I want to do some re-factoring of js-crawler and add more tests before this.