clj-commons / rewrite-clj

Rewrite Clojure code and edn
https://cljdoc.org/d/rewrite-clj
MIT License
591 stars 55 forks source link

More efficiently seek to position #171

Open mainej opened 2 years ago

mainej commented 2 years ago

Problem/Opportunity

find-last-by-pos is slower than it could be, especially when used to find zlocs at the end of files.

Suppose for example that you have a zipper at the root of the following code, and know the row and column of b-b-a.

(a
 (a-a
  (a-a-a)
  (a-a-b)
  (a-a-c)))

(b
 (b-a
  (b-a-a)
  (b-a-b)
  (b-a-c))
 (b-b
  (b-b-a))) ;; <-- we want to get here, to b-b-a

(c)

The problem with using find-last-by-pos to find the node is that it uses z/next*, which means it does a depth first traversal of the tree. But, since a's end row precedes b-b-a's start row, neither a nor any of its children or grandchildren will be b-b-a. Similarly, neither b-a nor its children can be b-b-a. Therefore traversing a's or b-a's branches is wasted time. To find a node near the end of a large file ends up traversing hundreds or thousands of nodes that cannot possibly be the desired node.

Proposed Solution

I propose that either find-last-by-pos be made more efficient, or that new tools be introduced, tools specialized to the constraints of navigation by position.

The essential idea is to change the navigation algorithm to skip z/right* over nodes that can't contain the position, before descending z/down* and recursing in nodes that do contain the position. You can see an implementation of this idea here.

Aside: The idea for this request was developed in https://github.com/clojure-lsp/clojure-lsp/pull/793. I encourage you to look at that PR for additional context, noting especially the performance analysis of large files. Be aware that the original implementation used a function that, though it was named find-last-by-pos, wasn't rewrite-clj's version. That function was however essentially the same algorithm as rewrite-clj's, and so I expect the performance characteristics to be similar.

Additional context

In regards to whether to replace z/find-last-by-pos, note that the clojure-lsp version accepts a row and col, but, z/find-last-by-pos accepts a full range. I haven't thought too carefully about the consequences of that difference. I think that using the end row/col of the range would return the same results, but perhaps that isn't considering every case.

There are a few variants of this algorithm that rewrite-clj may want to support.

One is to find both the start and end zloc in a range (see, for example, this clojure-lsp code). This can be optimized beyond simply searching from the initial node twice, by taking advantage of the fact that we know the end must follow the start. A conservative version of this algorithm looks like the following pseudocode. There may be more efficient algorithms which don't search all the way to the top before beginning the search for the end loc.

(let [start-loc (z/find-at-pos root start-row start-col)]
  [start-loc (z/find-at-pos (z/top start-loc) end-row end-col)])

Another variant is to find the minimal set of nodes that span a range. That is, find the deepest, shortest set of sibling nodes that contain the start and end of the range. Admittedly, I don't have a use-case for this scenario and it may be too specialized for rewrite-clj.

Action I contributed the clojure-lsp code, and would be willing to port it to rewrite-clj. I've talked with @ericdallo, one of the clojure-lsp maintainers, who said that'd be OK.

To use an optimized z/find-last-by-pos, clojure-lsp would also need #170, since it doesn't use position tracking. Alternatively, if something like find-by-heritability were exposed as a public method in rewrite-clj, it could use that. But either way, that consideration shouldn't affect this discussion.

ericdallo commented 2 years ago

Pinging @lread as may be interested on discussing this

lread commented 2 years ago

Two issues in one day? @mainej, you are spoiling me!

This sounds like an interesting idea, but I'll have to take a deeper peek.

Lucky for me, you've done a great job describing the issue.

mainej commented 2 years ago

Haha, sorry—I've had these on my list for a long time and finally got around them. I hope I'm not inundating you. :)

lread commented 2 years ago

Not at all, I am genuinely delighted!