Open caseydawsonjordan opened 10 years ago
I'd be interested to know how you do your profiling?
I think the particular performance problem here is not specific to CE. If you have 100 template rules of this form then Saxon has to evaluate each of them against every element, and that's always going to be expensive. Your best bet for improving this is to build an index:
<xsl:key name="classkey" match="*" use="tokenize(@class, '\s')"/>
and then
match="key('classkey', 'topic/p')"
even then, Saxon is going to test every element against 100 rules, only the test will now be faster. Perhaps the pattern only applies to certain elements? In that case, adding the element name to the pattern rather than "*" will help.
In 3.0 for this one I would abandon template rules and use a map from class tokens (like 'topic/p') to functions; write the processing as a function, build a global variable containing a map from tokens to functions, and then do a dynamic function invocation based on the token value.
In Saxon-CE, which is XSLT 2.0, we could perhaps emulate that using ideas from FXSL: something like
<xsl:variable name="rules"> <a token="topic/p"/> <b token="topic/q"/> ... </xsl:variable>
<xsl:key name="rulekey" match="*" use="@token"/>
<xsl:template match="a" mode="classtoken">../xsl:template
<xsl:template match="*[@class]"> <xsl:apply-templates select="key(@rulekey, $rules)" mode="classtoken"/> </xsl:template>
Of course you haven't explained exactly what you are doing so I have had to make some guesses.
There remains a question as to whether we can get closer to the performance of the native XSLT engines in the browser. I think that so long as we're writing in Java which is translated to Javascript which is interpreted by the browsers, we're never going to be as fast as native code running directly in the browser. We're using some fantastic technologies, but they still impose an overhead.
Hi Michael thanks for the detailed response.
Right now I am profiling by compiling SaxonCE with GWT's DETAILED (-style=DETAILED) level view so I can use Chromes CPU profiler to see where the hotspots are. I am running the transform once right at the beginning without the profiler on, then I am running 10 transforms one after another with the CPU profiler running and then looking at the results.
I like your suggestion about moving over to using keys, that is something I could try and report back on my findings. I also think we could move this over to using pure element names and test that as well.
I am happy to provide more information on what I am doing if you have further questions. Right now I am just trying to determine if it is possible to get this transform within 2-3x of the time that the browser takes, if that is possible then I think it opens up a lot of possibilities. If not, and there is really a hard barrier around say 10x-30x, then a re-render becomes something that is quite noticeable in the user experience.
I know that this could be improved by only re-rendering parts of the document which have changed, however in our case this is not possible. We need to re-render the entire document each time a change has been made.
Thanks!
Has there been any thoughts on how to improve the performance of SaxonCE? I have a small XML document I am transforming (10kb) and it takes between 100-300ms to transform with SaxonCE, in comparison, the browser does this same transform in less than 10ms.
(Note: With SaxonCE I'm transforming to a string for profiling purposes to remove any effects of the DOM operations which might be slow. Also, I am not including the time to load, and parse the XSLT stylesheet or the XML document, this is purely the time taken for the XSLT engine to produce the output by applying the templates.)
I did some profiling and I see that most of the time is spent searching for the right matching template. In my case, about 80ms is spend in the CollatingFunction.evalContains()
I expect this is because lots of our templates use this match clause:
match="*[contains(@class, ' topic/p ')]
I'd say our stylesheet has about 100 of these templates. I am looking for some general advice as to how this could be improved so I can start digging deeper into some optimizations.
Thanks!