atmire / COUNTER-Robots

Official list of user agents that are regarded as robots/spiders by COUNTER
MIT License
64 stars 29 forks source link

Java pattern is too specific and has a syntax error #63

Open alanorth opened 9 months ago

alanorth commented 9 months ago

I just noticed the user agent Java/21 in my server access log. We currently have the following pattern in COUNTER-Robots:

^java\/\d{1,2}.\d

This pattern matches java/1.8 but not Java/21 (see https://regex101.com/r/pweujD/1). Also, I'm just realizing that the dot should be escaped so it is interpreted as a literal dot, not a regex metacharacter.

I suggest the pattern be updated to be ^java\/\d+ or perhaps even just ^java. Both are enough to uniquely identify the user agent.