arouel / uadetector

UADetector is a library to identify over 190 different desktop and mobile browsers and 130 other User-Agents like feed readers, email clients and multimedia players. In addition, even more than 400 robots like BingBot, Googlebot or Yahoo Bot can be identified.
http://uadetector.sourceforge.net/
Apache License 2.0
246 stars 100 forks source link

Increase performance using JRegex #53

Closed pedroteixeira closed 10 years ago

pedroteixeira commented 10 years ago

Hi,

This other library uses JRegex: https://github.com/fsiegrist/UASparser

and seems to perform 2x as fast as this one.. But it doesn't currently give all correct answers from some useragents.

Is there any plan to improve performance? Would be easy to swap between implementations to benchmark?

arouel commented 10 years ago

@pedroteixeira Have you made deep performance tests or is this an assumption? JRegex is not known to be faster or slower than java.util.regex. Please provide a benchmark to give us an opportunity to verify your claim.

Some tips and tricks can be found here: http://www.ibm.com/developerworks/java/library/j-benchmark1/

pedroteixeira commented 10 years ago

Thanks for the reply. I just performed some very simple benchmark locally with something like:

 long start;
        double avg_time = 0;
        int count = 10 * 1000;

        for (int i = 1; i < count; i++) {
            start = System.currentTimeMillis();
            parser.parse(input);
            avg_time += (System.currentTimeMillis() - start);
        }

        System.out.println("elapsed: " + avg_time + "ms");
        avg_time /= count;
        System.out.println("avg: " + avg_time + "ms");

For the same UA strings (same or similar database I assume) and without caching, I got: ~ 600 hits/s with before/uadetector ~ 1200 hits/s with fsiegrist/UASparser

I assumed it was related to JRegex.. But you're right, it can be something else.

arouel commented 10 years ago

Please retest your comparison by warming up the JVM first by parsing 10000 times or more some user agent strings and than measure the time. Thanks for your feedback.