UADetector is a library to identify over 190 different desktop and mobile browsers and 130 other User-Agents like feed readers, email clients and multimedia players. In addition, even more than 400 robots like BingBot, Googlebot or Yahoo Bot can be identified.
I've examined, why user agent parsing is slow. Here are some tips:
This could be done just with HashMap<String, Robot>. Note no regexp here.
AbstractUserAgentStringParser.examineAsBrowser()
for (final Robot robot : data.getRobots()) {
if (robot.getUserAgentString().equals(builder.getUserAgentString())) {
Lazy OS detection. OS is not always needed.
Lazy Device detection. Same here. Device is not always needed.
Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea:
We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example:
EnumTest1: User agent starts with string "Mozilla"
If this return false, don't test any rexep that start with /^Mozilla
EnumTest2: User agent starts with string "M"
If this return true, don't test any regex starting with /^ but not starting with /^M
There are 631 , 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.
There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser
test: if ( hashmap.containsAll( requiredWords ) )
This would need probable new field for required words in uasdata.
I've examined, why user agent parsing is slow. Here are some tips:
This could be done just with HashMap<String, Robot>. Note no regexp here. AbstractUserAgentStringParser.examineAsBrowser() for (final Robot robot : data.getRobots()) { if (robot.getUserAgentString().equals(builder.getUserAgentString())) {
Lazy OS detection. OS is not always needed. Lazy Device detection. Same here. Device is not always needed.
Whole regular expression loop. This is probably good for development and maintenance but not so great for performance. Here is idea: We can make enum with some tests and check browser EnumSet, if contains this Enum before testing regex. Example: EnumTest1: User agent starts with string "Mozilla" If this return false, don't test any rexep that start with /^Mozilla
EnumTest2: User agent starts with string "M" If this return true, don't test any regex starting with /^ but not starting with /^M
There are 631, 150 starts with /^Mozilla, 246 starts with /^ but not with /^M. This two checks can be implemented without any change to uasdata.
There also can be list of words that uastring has to contain. Split the UA string into HashMap with words and check this rules before regexp. This would be fast. Example:
requiredWords: mozilla, AppleWebKit, NetFrontLifeBrowser test: if ( hashmap.containsAll( requiredWords ) ) This would need probable new field for required words in uasdata.
Regards, Pavel