clj-commons / rewrite-clj

Rewrite Clojure code and edn
https://cljdoc.org/d/rewrite-clj
MIT License
591 stars 55 forks source link

Get Some Idea of Performance Cost of Using Position Tracking Zipper #172

Open lread opened 2 years ago

lread commented 2 years ago

I've been meaning to do this for a while. When asked, I have typically told folks that I expect the position tracking zipper incurs a performance hit, but maybe I should get at least some vague idea of what that hit is. And document it.

Inspired by #170, #171

lread commented 2 years ago

Not sure how many of these I'll hit, but environments of interest:

  1. Clojure
  2. ClojureScript (probably NodeJs only?)
  3. Graal native-image
  4. Graal native-image SCI interpreted (maybe through Babashka?)
borkdude commented 2 years ago

I think measuring 1 and 2 is the most important. If those get faster, then I expect that 3 and 4 also get faster (although sometimes 3 optimizes more aggressively.

In short, I would optimize for 1 (JVM) and 2 (but even then I think 1 is, given the areas where rewrite-clj is most used, is by far the most important).

lread commented 2 years ago

Yeah, that sounds very reasonable, thanks!

mainej commented 2 years ago

@lread I don't mean to steal your thunder, but I had to know...

Here's my analysis of cljs/core.cljs, the file you suggested. That file has 12K lines, which is way outside the norm for a Clojure file. But, it makes for interesting analysis...

It takes approximately the same amount of time to parse the file either as a tracked or untracked zipper. An untracked zipper can traverse all the nodes in about a quarter as much time.

cljs/core.cljs tracked stddev untracked stddev % of tracked
z/of-file 318ms 50ms 312ms 38ms 98%
(iterate z/next*) 422ms 30ms 111ms 1ms 26%

This equivalent parsing time and faster traversing time is explained by the fact that a tracked zipper calculates its position data on the fly, as you traverse it.

But I think this overstates the performance differences between tracked and untracked zippers. What if you were to do a little work at each node, and in particular, what if you were to ask each node about its position? Forcing both zippers to deal with position data would be a more apples-to-apples comparison.

A good candidate for that kind of comparison is z/find-last-by-pos, which needs to calculate node position data. As we're doing this, we can look at how z/find-last-by-pos fares navigating to a node near the top of the file, versus one near the bottom.

z/find-last-by-pos tracked stddev untracked * stddev % of tracked
top of file 909ms 8ms 653ms 5ms 72%
bottom of file 904ms 9ms 653ms 6ms 72%

* untracked z/find-last-by-pos needs a patched zraw/position that returns node metadata (see #170)

The first thing to notice is that z/find-last-by-pos takes about the same time whether finding a node at the top or bottom of the file. Since it uses last, it always inspects every node. The second thing to notice is that the untracked version takes about three quarters as much time. This isn't as big of a difference as with plain traversing, but still suggests that adding together z/extents, which a tracked zipper does as you traverse it, adds significant overhead.

While we're doing this, we might as well compare z/find-last-by-pos to the alternate implementation from clojure-lsp I described in #171.

lsp/find-at-pos tracked stddev untracked stddev % of tracked
top of file <1ms 0ms <1ms 0ms -
botttom of file 107ms 1ms 2ms 0ms 2%
% of z/find-las-by-pos 12% <1%

I think it's clear this algorithm is faster than z/find-last-by-pos. Finding a node at the top of the file is effectively instantaneous. Finding a node at the end of the file is also much faster, regardless of what type of zipper you use, but especially for untracked zippers. One thing to note is that the version of lsp/find-at-pos that I'm testing here only inspects the node's metadata. So, you'd think that the tracked and untracked times would be the same. But, apparently, to get all the way z/right* in this huge file takes about 100ms for a tracked zipper. The untracked zipper doesn't seem to have this overhead.

Even with the optimized algorithm it takes 107ms to get to the bottom of the file in a tracked zipper, versus 2ms for an untracked zipper. This suggests the tracked zipper isn't zippy enough for clojure-lsp. It was worth checking though.

Let me know if you want the code that generated these stats. Or, I could pretty easily re-run it with smaller files, if you'd like that for comparison.

All benchmarking done with criterium. JVM 11, Clojure 1.11. 2.5 GHz Dual-Core Intel Core i7, 16 GB RAM.

borkdude commented 2 years ago

This suggests the tracked zipper isn't zippy enough for clojure-lsp.

FWIW, when I started building clj-kondo I discovered that not using zippers at all was much better for performance, so I went that route.

mainej commented 2 years ago

when I started building clj-kondo I discovered that not using zippers at all was much better for performance

It appears that clojure-lsp leans on that discovery. It tends to construct a zipper only when it plans to edit a file. Otherwise for the most part it uses the (cached) analysis from clj-kondo.

lread commented 2 years ago

Hey ho! Thanks for the thoughtful analysis @mainej.

I barely started looking at this. I was also using cljs/core.cljs while poking around. Like you, I'm finding tracked/untracked vs untracked give similar results on parse. I used z/of-string where you used z/of-file, I see a mean of 331ms for untracked vs a mean of 318ms for tracked.

My next test was to try a z/next* until I hit the end of the source. Fun fact: this traverses 94173 nodes. For untracked I see a mean of 115ms and tracked gives me 495ms, so similar to your results.

But I think this overstates the performance differences between tracked and untracked zippers. What if you were to do a little work at each node, and in particular, what if you were to ask each node about its position?

I think this is very interesting, but I don't see it as the focus of this specific issue (still relevant to #171 tho!) Here I'm just trying to get a rough idea of the cost of choosing a position tracking zipper today.

mainej commented 2 years ago

I'm just trying to get a rough idea of the cost of choosing a position tracking zipper today

That's reasonable. So, to focus on that case a little bit more, I ran clj-async-profiler on a traversal of every node:

(let [tracked (z/of-file "/path/to/clojurescript/src/main/cljs/cljs/core.cljs"
                         {:track-position? true})]
    (println "\n\n\ntracked, traverse")
    (time
      (profiler/profile
        {:min-width 5}
        (dotimes [_ 25]
          (->> tracked
               (iterate z/next*)
               (take-while (complement z/end?))
               last
               z/node)))))

I'll attach the flamegraph here, but Github always "helpfully" converts the nicely interactive SVG where you can drilldown into subsections into a static image. If you want, I can share the original file somewhere else.

My summary is: the profiler was in rewrite-clj code about 92% of the time. That time breaks down roughly like this:

 3% (take-while (complement z/end?))
 3% other zipper navigation
10% clojure.core assoc/conj/etc.
 2% +extent
67% extent
 7% miscellaneous other

So, obviously, extent is the place to focus energy. To break that 67% down a little further:

 1% meta
 8% reverse
18% one of the filters for \newline
15% the other filter for \newline
21% (string node)
 4% miscellaneous other

rewrite_clj_traverse_tracked_zipper

lread commented 2 years ago

Q: To what extent do we have a performance issue? A: Yes.

Ya, that would have been my guess, but nice to see it in flames.

mainej commented 2 years ago

The best I've been able to do so far is the following, which benchmarks around 250ms, cutting the difference between tracked and untracked cursors by about 55% (with all the usual caveats: on this file, under this one workload of seeking through every node, on my machine).

        (let [s (string node)
              last-c (subs s (dec (count s)))
              lines (string/split s #"\n" -1) ;; -1 includes trailing empty lines
              rows (dec (count lines)) ;; all on one line moves the cursor zero rows
              cols (cond-> (count (last lines))
                     ;; count trailing newline when elided by split
                     (= last-c "\n") inc)]
          [rows cols])

I tried string/split-lines but that drops trailing newlines, messing up the row counts, which I was only able to fix with the barely document -1 argument to string/split.

I also experimented a little with line-seq, but it didn't look promising. First, I don't think it exists in ClojureScript, which probably excludes it right away. Second, it has the same trailing newline problems as string/split-lines. And finally, it didn't even seem much faster than the regex, although I didn't make it all the way to actually benchmarking it.

I suspect there's some lower-level string manipulation that would be faster than a regex. I'll keep exploring.

mainej commented 2 years ago

Uh wait... Why does the original algorithm return the same thing for both of these strings?

"\n"  ;; => [1 1]
"\na" ;; => [1 1]

I'd expect it to be:

"\n"  ;; => [1 0]
"\na" ;; => [1 1]

But maybe not... I guess in an editor, when you're on a new blank line, your cursor is on the first column. And after you add a character, that character is also in the first column. Not very intuitive to my brain.

mainej commented 2 years ago

OK, last note for today. Here's a lower-level version. It benchmarks at 231ms, a (somewhat disappointingly) small improvement.

        (let [s (string node)
              ;; indexes of newlines
              is (loop [idx -1
                        idxs []]
                   (if-let [i (string/index-of s \newline (inc idx))]
                     (recur i (conj idxs i))
                     idxs))
              s-length (count s)]
          [(count is)
           (if (seq is)
             (let [trailing-newline? (= "\n" (subs s (dec s-length)))]
               (cond-> (- s-length (last is))
                 (not trailing-newline?) dec))
             s-length)])

That cuts the difference between the tracked and untracked zippers by 61%. Or to think about it a different way, originally the tracked zipper took 3.8 times as long as the tracked zipper, and now it takes 2.1 times.

Breaking down times... of the 98% of the total time that the profiler spent in rewrite-clj, 53% was in extent. Slicing that 53% (in percents of total time):

35% (string node)
13% finding newlines
5% other

There may be more to eke out from newline counting—13% is still a sizable chunk—but I'm out of techniques. The next bottleneck in extent is (string node). And that one's gonna be harder to get around. Unless we can find a way to not call extent as often, we're a little stuck. Since next* visits every node, and string recursively calls itself on children nodes, you end up stringing the same node several times. Maybe there's an optimization there, though I'm not sure.

I'll also point out that something like 25% of the time is doing other custom-zipper maintenance (mostly in z/right*) that you don't need to do in an untracked zipper. If all we do is focus on extent, that'll be a lower bound on how close we can get them.

lread commented 2 years ago

Thanks for taking a peek @mainej, I'll let this simmer in my noggin for a bit before diving in any deeper.

mainej commented 2 years ago

Yeah, back to the hammock.

NoahTheDuke commented 5 months ago

This suggests the tracked zipper isn't zippy enough for clojure-lsp.

FWIW, when I started building clj-kondo I discovered that not using zippers at all was much better for performance, so I went that route.

@borkdude what does clj-kondo do instead of using zippers?

borkdude commented 5 months ago

I just manually process nodes, no zipping, too much waste of performance