amoilanen / js-crawler

Web crawler for Node.JS
MIT License
253 stars 55 forks source link

Would be awesome to apply a selector to limit scope of crawled links #15

Open duggi opened 9 years ago

duggi commented 9 years ago

for example:

crawler.crawl({
  url: "http://localhost:8080/locations/",
  selector: ".main-content"

would only follow the links found inside .main-content

this way i don't have to keep crawling the header, footer, sidebars, etc on every page


thank you for writing this!

amoilanen commented 9 years ago

Hi,

It can be an interesting feature, the only problem is that at the moment the crawler does not deal with the page content as DOM, it is just a text content. But maybe we can limit the section of the page which should be crawled in some other way. I will investigate this a bit more.