Bartozzz / crawlerr

A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
https://npmjs.com/package/crawlerr
MIT License
25 stars 7 forks source link

Scanning web sites stop after first match #34

Closed Dragnucs closed 6 years ago

Dragnucs commented 6 years ago

When trying to scan my website or npmjs blog using the example as is, crawlerr stops finding matches after the first one. Other than that, params are set as undefined. Please notice that crawler does not stop, it still prints success messages and is able to access desired pages, it only can't match them.

'use strict'

const crawler = require('crawlerr')
const spider = crawler("https://touha.me/")

spider
    .when('/post/[all:slug]')
    .then(({ req, res, uri }) => {
        const slug = req.param('slug')
        console.log(`Found post ${slug}`)
    })

spider.on("error", error => {
  console.log(`[Error] ${error}`);
})

spider.on("request", url => {
  console.log(`[Success] ${url}`);
});

spider.start();

Output sample

Found post undefined
[Success] https://touha.me/#
[Success] https://touha.me/
[Success] https://touha.me/about/
[Success] https://touha.me/contact/
[Success] https://touha.me/cv/
[Success] https://touha.me/projets/
[Success] https://touha.me/selfhosting/
[Success] https://touha.me/index.xml
[Success] https://touha.me/page/2/
[Success] https://touha.me/post/meta-federated-social-network.en/
[Success] https://touha.me/tags/federation/
[Success] https://touha.me/tags/telecomunication/
[Success] https://touha.me/post/2-click-social-media-buttons-ou-le-partage-social-ethique-pratique/
[Success] https://touha.me/tags/pratique/
[Success] https://touha.me/tags/vie-priv%C3%A9e/
[Success] https://touha.me/tags/wordpress/
[Success] https://touha.me/post/les-clients-twitter-libre-natifs-gnu-linux/
[Success] https://touha.me/tags/twitter/
[Success] https://touha.me/post/rooter-samsung-galaxy-tab-3-avec-heimdall-sm-t210-sm-t210r/
[Success] https://touha.me/tags/android/
[Success] https://touha.me/tags/heimdall/
[Success] https://touha.me/tags/root/
[Success] https://touha.me/post/tablette-samsung-galaxy-tab-3-nouvelle-experomentation/
[Success] https://touha.me/tags/foss/
[Success] https://touha.me/tags/nonfree/
[Success] https://touha.me/tags/tablette/
[Success] https://touha.me/post/python-utiliser-gtksourceview-avec-fichier-glade/
[Success] https://touha.me/tags/glade/
[Success] https://touha.me/tags/gtk3/
Bartozzz commented 6 years ago

Thanks for submitting this issue. I'll have a look at it when I come back from vacation.

Dragnucs commented 6 years ago

@Bartozz please have a look at the pull request. I am not sure if it is the right fix, but at least it points towards a possible solution.

Bartozzz commented 6 years ago

Yup, you've provided the right fix. Thanks again for contributing. Everything should work as expected now.