JayBizzle / Crawler-Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent
https://crawlerdetect.io
MIT License
2k stars 259 forks source link

Performance and general clean up #312

Closed MaxGiting closed 5 years ago

MaxGiting commented 5 years ago

I haven't done any comparisons yet. But removing regexes and shortening others can only help right!

So far I have:

I've made sure that every regex removed had a related user agent in the tests.

I am going to shorten a lot of the really long regexes as they just don't need to be so long.

What are peoples thoughts on adding the word extractor to the generic regex? This would eliminate the need for another 6 regexes which I feel are so specific to bots.

JayBizzle commented 5 years ago

LGTM 👍

MaxGiting commented 5 years ago
MaxGiting commented 5 years ago

If there is whitespace in a regex, is there a rule that it must be "escaped" with a backslash? I vaguely remember this is due to people using the raw export not because of PHP.

For example this line and the next differ.

https://github.com/JayBizzle/Crawler-Detect/blob/0935d1eb9932f1109c740ae0aa1d9c0c6dc0c8a5/src/Fixtures/Crawlers.php#L557 https://github.com/JayBizzle/Crawler-Detect/blob/0935d1eb9932f1109c740ae0aa1d9c0c6dc0c8a5/src/Fixtures/Crawlers.php#L558

MaxGiting commented 5 years ago

44 regexs removed. A lot of tidying up as well. All listed below.

MaxGiting commented 5 years ago

Seen roughly a 4 - 6% increase in speed. It will easily be eaten up as we add more user agents, but a good clean out none the less.