BetterThanTomorrow / calva

Clojure & ClojureScript Interactive Programming for VS Code
https://marketplace.visualstudio.com/items?itemName=betterthantomorrow.calva
Other
1.67k stars 217 forks source link

Indent not working as expected in this specific case #1686

Open bpringe opened 2 years ago

bpringe commented 2 years ago

With the below code, if I place my cursor after the closing ] for the let binding form and hit enter, the cursor ends up where denoted by | which is not expected.

Actual:

(let [x {:some-top-key
         {:some-things
          {2021
           {:my-year 2021,
            :my-things
            [{:some-number 200000, :some-date #inst "2021-11-15T00:00:00.000-00:00"}
             {:some-number 200000, :some-date #inst "2022-02-15T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2022-05-16T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2022-07-16T00:00:00.000-00:00"}]},
           2022
           {:my-year 2022,
            :my-things
            [{:some-number 200000, :some-date #inst "2022-11-15T00:00:00.000-00:00"}
             {:some-number 200000, :some-date #inst "2023-02-15T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2023-05-16T00:00:00.000-00:00"}]}},
          :some-other-things
          [{:some-thing-id "1",
            :some-year 2021,
            :some-person "someone",
            :some-date #inst "2021-11-13T00:00:00.000-00:00",
            :foo :bar}]}}]
            |)

Expected:

(let [x {:some-top-key
         {:some-things
          {2021
           {:my-year 2021,
            :my-things
            [{:some-number 200000, :some-date #inst "2021-11-15T00:00:00.000-00:00"}
             {:some-number 200000, :some-date #inst "2022-02-15T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2022-05-16T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2022-07-16T00:00:00.000-00:00"}]},
           2022
           {:my-year 2022,
            :my-things
            [{:some-number 200000, :some-date #inst "2022-11-15T00:00:00.000-00:00"}
             {:some-number 200000, :some-date #inst "2023-02-15T00:00:00.000-00:00"}
             {:some-number 300000, :some-date #inst "2023-05-16T00:00:00.000-00:00"}]}},
          :some-other-things
          [{:some-thing-id "1",
            :some-year 2021,
            :some-person "someone",
            :some-date #inst "2021-11-13T00:00:00.000-00:00",
            :foo :bar}]}}]
  |)
bpringe commented 2 years ago

I tried deleting certain parts to narrow it down to a smaller number of lines for repro, but it kept working as expected when I removed things. I'm not sure if the length of content matters, or why it would.

PEZ commented 2 years ago

Very interesting! I managed to minimize it down to not being about the data, but about the structure:

[x {:a
    {:a
     {:a
      {:a 1
       :b
       [1
        2
        3
        4]}
      :b
      {:a 1
       :b
       [1
        2
        3]}}
     :b
     [{:a 1
       :b 1
       :c 1
       :d 1
       :e 1}]}}
       |]

It is also about the newlines:

[x {:a
    {:a
     {:a
      {:a 1
       :b
       [1
        2
        3
        4]}
      :b
      {:a 1
       :b
       [1 2
        3]}}
     :b
     [{:a 1
       :b 1
       :c 1
       :d 1
       :e 1}]}}
 |]

(The difference is at [1 :a :a :a :b :b], where I've moved 2 up after 1.)

And it is our TypeScript indenter that is behaving like this. If I use the ClojureScript one, things for as expected. (Unchecking Calva-fmt: New Indent Engine makes us fall back on the formatter which is written in ClojureScript.)

PEZ commented 2 years ago

Actually it is probably only about the newlines:

https://github.com/BetterThanTomorrow/calva/blob/dev/src/cursor-doc/indent.ts#L53

We should performance test this a bit and see what values we think we can afford here. Pinging in @corasaurus-hex because I just uttered performance.

corasaurus-hex commented 2 years ago

poofs into existence in a cloud of smoke and sparks

Oh, hi!! 💜

That spell has many incantations that are expensive that we could probably make cheaper.

So, for micro-optimizations:

  1. Switch == to === (doesn't try to cast values when using ===)
  2. using constant values where you can (like lodash.keys(rules) really only needs to be called once, ['id', 'kw'] allocated only once, etc). I'm sure the JIT gets some of these but we can help it along.
  3. Use built-ins like Array.prototype.find instead of lodash.find or Object.getOwnPropertyNames instead of lodash.keys. iirc the built-ins perform much better and we're not using the extras lodash provides with their function calls.
  4. It's usually cheapest/fastest to use Map.prototype.has or key in object or Set.prototype.has or Object.hasOwnProperty for checking to see if an object is included in a group of values https://jsbench.me/yql26frz99/1 https://jsbench.me/yql26frz99/2 so ['id', 'kw'] could use one of these (as a constant that is only allocated once).
  5. Check numbers before checking string equality. for example, (prevToken.type == 'open' && prevToken.offset <= 1) should be (prevToken.offset <= 1 && prevToken.type == 'open')
  6. Set up the regexp used in testCljRe ahead of time.
  7. Kind of a wild idea, but what if we used numeric constants for things like prevToken.type since number equality checks are so cheap. If comparing two strings that are different lengths then string comparison should have similar performance to number comparison (since both are just checking number equality) but as soon as you start having to compare characters the string comparison will be more expensive.

All of those should be confirmed with profiling, of course, and might not yield tremendous gains.

The biggest wins, though, are likely to be in algorithm changes. There's a lot going on here that I'm not really clear on so I'm not sure where to head with this.

All that said, we can profile to see what an acceptable number of lines to backtrack is. What kind of performance is acceptable?

PEZ commented 2 years ago

Hello there! Glad you notice when you are summoned! ✨

I'd be ready to allow it to be as slow as it is using the formatter, which we used to do, without people complaining. We could start with measuring that performance for what compares to 250 lines of backtracking. And compare it to what the indenter does if we bump to 250 lines.

There is also the maxDepth variable, but I think that it is kind of fine with 3, but then again, not all that sure what it does, actually.

Knowing the guy who wrote that indenter to begin with, I wonder how much algorithmic room there is to improve it. When I fixed some bug with the indenter a while ago I fixed one glaring thing, similar to 2 and 6 in your list.

I've been thinking about 7 before. Not for performance reasons, but for the brittleness with using strings like that. I've been thinking that enums could maybe solve that, but I'm not super savvy with those things so I have never acted on it.

corasaurus-hex commented 2 years ago

enums are pretty slick! https://www.typescriptlang.org/docs/handbook/enums.html#numeric-enums

PEZ commented 2 years ago

That looks perfect to me. We are using them here and there already in the codebase, so I don't know why I have been thinking of them as advanced...

PEZ commented 2 years ago

And slick indeed. Not as slick as Clojure keywords, but slick anyway. =)

corasaurus-hex commented 2 years ago

There are also Symbols but I imagine numeric comparison is going to win here by a long shot. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol