Closed MaxGiting closed 5 years ago
LGTM 👍
extractor
to the generic regex which removes the need for 6 other regexes.[0-9]
regex range from 34 regex patterns. Seemed overkill.If there is whitespace in a regex, is there a rule that it must be "escaped" with a backslash? I vaguely remember this is due to people using the raw export not because of PHP.
For example this line and the next differ.
https://github.com/JayBizzle/Crawler-Detect/blob/0935d1eb9932f1109c740ae0aa1d9c0c6dc0c8a5/src/Fixtures/Crawlers.php#L557 https://github.com/JayBizzle/Crawler-Detect/blob/0935d1eb9932f1109c740ae0aa1d9c0c6dc0c8a5/src/Fixtures/Crawlers.php#L558
44 regexs removed. A lot of tidying up as well. All listed below.
Added to generic regex:
checker
reader
extractor
monitoring
analyzer
Remove [0-9]
range where not needed.
Remove \.com
where not needed.
Remove \/
where not needed.
Reduce the length of some very long regexs
Remove \
as there's no need to escape whitespace. Make it consistent with for all regexs.
Seen roughly a 4 - 6% increase in speed. It will easily be eaten up as we add more user agents, but a good clean out none the less.
I haven't done any comparisons yet. But removing regexes and shortening others can only help right!
So far I have:
checker
to the generic regex which removes the need for 13 other regexes.reader
to the generic regex which removes the need for 10 other regexes.I've made sure that every regex removed had a related user agent in the tests.
I am going to shorten a lot of the really long regexes as they just don't need to be so long.
What are peoples thoughts on adding the word
extractor
to the generic regex? This would eliminate the need for another 6 regexes which I feel are so specific to bots.