JefferyHus / es6-crawler-detect

:spider: This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
MIT License
90 stars 30 forks source link

Fix preparing userAgents from headers #44

Closed pro2s closed 2 years ago

pro2s commented 2 years ago

User agent string gets undefined in front and space after, when user agent getting from headers on create crawler detector instance. For example 'user-agent': 'b0t' goes to undefinedb0t

pro2s commented 2 years ago

Also it fixes an issue at this line https://github.com/JefferyHus/es6-crawler-detect/blob/451daf91effdcf13d13ba787ef40569c8dce010c/src/lib/crawler.js#L103 when in request there are no user agent headers e.g. curl -A "" https://...

JefferyHus commented 2 years ago

Can you please share your code so I can base that on some test samples?

pro2s commented 2 years ago

Hi @JefferyHus I added additional test, and for current version of Crawler is failed.

    it('should identify the crawler from request headers with exact pattern', async () => {
      crawler = new Crawler({
        headers: { 'user-agent': 'b0t', accept: '*/*' },
      });

      assert.strictEqual(crawler.isCrawler(), true);
    });

In my code we use Crawler like this:

const detector = new Crawler(request);
if (detector.isCrawler()) {
  console.log('bot');
} else {
  console.log('user');
}

And currently when I check server response with curl: For curl -I -A http://localhost:3000 I get an error because in this case userAgent is undefined.

 TypeError: Cannot read properties of undefined (reading 'replace')
    at Crawler.isCrawler (/node_modules/es6-crawler-detect/src/lib/crawler.js:103:19)

For curl -I -A b0t http://localhost:3000 I get user in console, but we have ^b0t$ rule. For curl -I http://localhost:3000 I get bot in console, because an curl rule is matched.

JefferyHus commented 2 years ago

Hi @JefferyHus I added additional test, and for current version of Crawler is failed.

    it('should identify the crawler from request headers with exact pattern', async () => {
      crawler = new Crawler({
        headers: { 'user-agent': 'b0t', accept: '*/*' },
      });

      assert.strictEqual(crawler.isCrawler(), true);
    });

In my code we use Crawler like this:

const detector = new Crawler(request);
if (detector.isCrawler()) {
  console.log('bot');
} else {
  console.log('user');
}

And currently when I check server response with curl: For curl -I -A http://localhost:3000 I get an error because in this case userAgent is undefined.

 TypeError: Cannot read properties of undefined (reading 'replace')
    at Crawler.isCrawler (/node_modules/es6-crawler-detect/src/lib/crawler.js:103:19)

For curl -I -A b0t http://localhost:3000 I get user in console, but we have ^b0t$ rule. For curl -I http://localhost:3000 I get bot in console, because an curl rule is matched.

Thanks, I will check this our and from the code I would just see a for.... instead of map, one reason is so that the event loop awaits the loop to resolve then continue the next tick

pro2s commented 2 years ago

Ok, I will try to use for... instead of map

pro2s commented 2 years ago

Fixed map and added test for empty user agent header