Laplandia / owaspantisamy

Automatically exported from code.google.com/p/owaspantisamy
0 stars 0 forks source link

SAX implementation is significantly slower than DOM for lots of small input documents #136

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Our application processes lots of small input documents (for example, 
user-entered chat messages or status updates). It appears that the SAX 
implementation is considerably slower than the DOM implementation.

Consider this test case (you can just add it to the end of AntiSamyTest.java):

    public void testSpeedSmallInputs() throws ScanException, PolicyException {
        long    start;
        long    finish;
        final int       iters = 5000;
        final String    inputHTML = "<p>This is a pretty simple input document.</p>";

        start = System.currentTimeMillis();
        for ( int i = 0; i < iters; i++ )
        {
            as.scan( inputHTML, policy, AntiSamy.DOM).getCleanHTML();
        }
        finish = System.currentTimeMillis();
        System.out.println( "DOM: " + (finish-start)/1000.0);

        start = System.currentTimeMillis();
        for ( int i = 0; i < iters; i++ )
        {
            as.scan( inputHTML, policy, AntiSamy.SAX).getCleanHTML();
        }
        finish = System.currentTimeMillis();
        System.out.println( "SAX: " + (finish-start)/1000.0);
    }

My output is:
DOM: 0.666
SAX: 4.481

I get much better results if I apply the attached patch (to cache the 
TransformerFactory).

Original issue reported on code.google.com by danr...@gmail.com on 25 Jun 2012 at 10:05

Attachments:

GoogleCodeExporter commented 9 years ago
Great patch. Will integrate into next version.

Original comment by arshan.d...@gmail.com on 17 Sep 2012 at 1:55

GoogleCodeExporter commented 9 years ago
Fixed in 38cc85ec7ef41c16f3ced13bbf01934320c8bbd2, thanks for the patch !

"Unfortunately" dom is still a heck of a lot faster for 1.5 for small strings 
since I've performance tuned that extensively.

Original comment by kristian...@gmail.com on 28 Jan 2013 at 5:40