cursorless-dev / cursorless

Don't let the cursor slow you down
https://www.cursorless.org/
MIT License
1.13k stars 79 forks source link

Investigate DFA compilation times #2614

Closed pokey closed 1 month ago

pokey commented 1 month ago

They are up over 700ms for both me and @AndreasArvidsson. We should do another git bisect to see what caused it

pokey commented 1 month ago

@AndreasArvidsson suggested that https://github.com/cursorless-dev/cursorless/pull/2457 is the likely culprit, and I'm inclined to agree

AndreasArvidsson commented 1 month ago

I wonder if we could claw back some of the compilation time by removing cursorless_surrounding_pair_force_direction

AndreasArvidsson commented 1 month ago

@phillco

I've done some benchmarking. All figures are in milliseconds and performed with a restart in between in a python file.

It's obvious that surrounding pair is dfa intensive. A reason for this is that surrounding pairs is actually multiple list and captures. We have a list for pairs that are only selectable, but not wrapable, we have a list for pairs that are both selectable and wrapable, and we have a list for pairs that are just wrapable, but not selectable.

I think we could increase dfa performance quite a bit if we just did a single list with all the scope types. This means that when we read our csv files instead of creating a bunch of different lists we would basically create a single list with all the scope types. Give the value as something unique so we can distinguish them. eg surroundingPair.parentheses and then when the user speaks that scope we take the identifier and construct the actual scope. This would get rid of multiple Talon list captures and make it just a bag of words.

The only scope that we can't do this with and that still needs to be its separate capture is cursorless_glyph_scope_type. The reason for this is that it has an argument <user.any_alphanumeric_key>. In theory we could listen for changes to the registry and pull this out and construct a single list, but that feels a bit to hacky?

What I'm proposing is that we make a single list for all the simple scope types(line, funk, etc), all the surrounding pairs(round, diamond), and all the custom regexes. This hopefully would decrease dfa compilation time quite a lot. I can't of course know how much until I actually try it and this is quite an invasive rewrite. We have of course thousands of grammar test so I'm not worry that I won't get it right. It's more a matter of if this is a direction we prefer? Unfortunately I don't think we have much of a choice if we want to keep compilation time down.

phillco commented 1 month ago

Re cursorless_surrounding_pair_force_direction unfortunately we found these are still referenced in the documentation: https://github.com/cursorless-dev/cursorless/issues/2625