JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
1.97k stars 256 forks source link

Feature: Extend Crawlers data array #328

Closed peppeocchi closed 5 years ago

peppeocchi commented 5 years ago

Hello and thanks for this great package!

Introduction

The purpose of this PR is to allow the Crawlers data array to be extended with custom ones. I understand that to add new Crawlers to the list and contribute to the community it's better to create a PR to this repo with the new user agent strings, but sometime (or while a PR to add a specific user agent) it would be faster just to extend the data array to allow custom user agents to be recognised as crawlers without adding a custom implementation.

Description of changes

A new function has been added to the AbstractProvider that allows to extends the data array with custom strings. I've added it to the abstract provider so it could be used by the other classes extending it (both Exclusions and Headers), although I haven't provided an implementation to CrawlerDetect for any other class except Crawlers.

Example usage

The usage it's very simple, you can either pass a string or an array of strings, that would then be merged to the main data array, and the regex would compile again from the new list.

$cd = new CrawlerDetect(/*some user agent string*/);

$cd->extendCrawlers('some_user_agent');

$cd->extendCrawlers([
    'some_user_agent_1',
    'some_user_agent_2',
    # more user agents....
]);

Conclusions

Please feel free to make any change you might want to make before merging it, I would really love a feature like this to be built in the package rather than create my custom implementation just to add a couple of user agents to be detected as crawlers - or as I mentioned, it would be very useful while a PR waits to be merged with new user agents (e.g. https://github.com/JayBizzle/Crawler-Detect/pull/327)

Thanks!

JayBizzle commented 5 years ago

We really appreciate your input on this, but as you have pointed out, we have refrained from making it extendable solely to try and persuade people to create PR's that will benefit the entire community.

Thanks 👍

peppeocchi commented 5 years ago

Thanks for the reply, I thought so but it was worth giving a try! The reason behind this PR is that we have a multi tenant e-commerce platform and 1 or 2 bots non recognised and blocked in a reasonable time will cost us money. Thanks anyway

gplumb commented 5 years ago

You could always use a fork and push bot PRs upstream to the main project :-)

On Mon, 20 May 2019, 05:41 Giuseppe Occhipinti, notifications@github.com wrote:

Thanks for the reply, I thought so but it was worth giving a try! The reason behind this PR is that we have a multi tenant e-commerce platform and 1 or 2 bots non recognised and blocked in a reasonable time will cost us money. Thanks anyway

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JayBizzle/Crawler-Detect/pull/328?email_source=notifications&email_token=AAFU7ZSNUEIDY35YFCUO773PWITWTA5CNFSM4HN2S5L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVXVSWI#issuecomment-493836633, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFU7ZRKZFRNEKAJGRQ2QKTPWITWTANCNFSM4HN2S5LQ .

peppeocchi commented 5 years ago

You could always use a fork and push bot PRs upstream to the main project :-)

I'm using this package indirectly via the Laravel implementation, it would mean to fork that package as well, anyway requiring a custom implementation for such a small change. I'll stick with the "extra if" until a simpler solution comes up...

JayBizzle commented 5 years ago

We normally merge PRs fairly quickly. We just had a lot on recently with other work :+1: