Closed redmunds closed 10 years ago
Note that the original bug in the referenced issue is different from what I described above. Start with this comment if you care to look at the Brackets issue.
Could you set up a simple demo that shows A) what you're doing, and B) the bad performance.
Demo is simple, but not small -- bug is loading JS for code hinting in Brackets. This is the test I am using in Brackets on Win7:
window
Note: Worst performance is on initial load, then much better on subsequent hinting due to caching. To run test a second time, switch to another file & back to this one, or close & re-open file.
Using Chrome Dev Tools profiler, the time in CodeMirror.getTokenAt()
doubled between Brackets Sprint 29 & 30
Sprint 29: 717ms Sprint 30: 1497ms
You can get Brackets downloads by going to http://download.brackets.io/ and clicking on "All Downloads" button.
Okay, but that is a big project loaded into another big project, not a small demo. I need to see how you are using getTokenAt
. Can you reproduce the issue in a simple static HTML page with a small script that does something equivalent to what you are doing in brackets? Failing that, can you at least point me at the brackets code that is making the calls to getTokenAt
?
(And some background: The patch you linked to concerns a behavior where, if a token is requested at a point that was never parsed before, CM will try to find a parser state for a line before that. It will scan a certain range, and if it doesn't find a state, will simply use the start state, so that it doesn't have to synchronously parse the whole document. However, for mixed modes, there is no obvious way to determine whether we are in a sub-mode, so it will use a larger range, to increase the chances of seeing the token that started the sub-mode. The computed state will be saved, so that subsequent calls to get nearby tokens will reuse it. The only way I can see this making a large difference is if you're making multiple calls to getTokenAt
in a really big document, so that the computed ranges somehow don't overlap, or if you're doing something like setOption("mode", ...)
in betwen getTokenAt
calls, which would clear all the cached parser state. Or is only the first call slow?)
You asked for a simple demo, not necessarily a small demo :)
In general, the code is synchronously parsing an entire htmlmixed document. The purpose is to extract JS from <script>
blocks and maintain line numbers as it goes.
It starts at the beginning of the doc and calls getTokenAt()
until it hits the end. I have only looked at the average time for calls to getTokenAt()
, so I am not sure if the first call takes significantly longer.
The code calls CodeMirror.innerMode(outerMode, ctx.token.state).mode.name
in the loop for each token, so setOption("mode", ...);
could be getting called implicitly by some other method.
It would take me a while to put together a similar demo in an HTML page, so hopefully you can take a quick look at the code.
The main loop is in the HTMLUtils.findBlocks()
function:
https://github.com/adobe/brackets/blob/master/src/language/HTMLUtils.js#L476
The TokenUtils.moveNextToken()
function is defined here:
https://github.com/adobe/brackets/blob/master/src/utils/TokenUtils.js#L76
Any advice on better techniques for parsing an htmlmixed (or any other mode) document would be appreciated. Thanks.
Aha, the problem is the precise
flag that Narciso added in [1] . This
does not play well with the caching mechanism.
I'd suggest that if you're going to traverse the whole document anyway, you
use the runmode addon 2 to do your own iteration, which is going to be a
number of orders of magnitude more efficient than getTokenAt
.
[1]: https://github.com/marijnh/CodeMirror/commit/52558bf921b61ac194b1e71be4ab3c8cda50aa06
See also https://github.com/marijnh/CodeMirror/commit/99192a96f44e55eb395ad007330168e24d866c8d , which fixes the problem that caused this slowdown, and will probably help a lot here. But I'd still recommend using runmode instead.
On Tue, Oct 1, 2013 at 5:44 PM, Marijn Haverbeke marijnh@gmail.com wrote:
Aha, the problem is the
precise
flag that Narciso added in [1] . This does not play well with the caching mechanism.I'd suggest that if you're going to traverse the whole document anyway, you use the runmode addon 2 to do your own iteration, which is going to be a number of orders of magnitude more efficient than
getTokenAt
.[1]: https://github.com/marijnh/CodeMirror/commit/52558bf921b61ac194b1e71be4ab3c8cda50aa06
Yes, the precise
flag is the culprit. cc: @njx
Time spent in getTokenAt()
dropped from 1523ms to 82ms! What a difference a cache makes :)
I'll also pass on the info about addon_runmode to the team.
Thanks! Closing.
Ah. Originally the intent of the flag was only to be used in cases where we were looking up a single token, not in scanning the whole document. But it makes sense to reset the frontier afterwards to make future lookups faster.
Brackets uses the Tokenizer is used to parse HTML doc to extract JS from
<script>
tags for Code Hints. We recently tracked down a slowdown in the Tokenizer due to this change: https://github.com/marijnh/CodeMirror/commit/ba21985bfa7c2b8ad6b1ecd564a41b06083d2475Can you help me understand what that change fixes so we can try to fine tune that change to get the performance back?
Or maybe you can suggest a better method for extracting content from a tag in an htmlmixed page? Note that we're also seeing this slowdown for
<style>
tags.