clj-commons / rewrite-clj

Rewrite Clojure code and edn
https://cljdoc.org/d/rewrite-clj
MIT License
591 stars 55 forks source link

Create zipper at root node #189

Closed mainej closed 2 years ago

mainej commented 2 years ago

Problem/Opportunity Many files begin with copyright notices, documentation, or other kinds of comments and whitespace. z/of-string navigates past these to the first non-whitespace node (or to the root :forms node if the file is only whitespace.) z/find-last-by-pos navigates using z/next*, meaning it navigates "forward" from the current node.

Together this makes it unnecessarily challenging to navigate based on cursor position, especially to leading comment nodes.

Consider, for example, clojure-lsp. It starts with data provided by a text editor: the contents of a file and a cursor position. clojure-lsp wants to navigate to the node at that position. It doesn't know a priori whether the file did or did not start with comments, or whether it was entirely comments. It also doesn't know whether the cursor position was in the leading comments, or somewhere after them. So, after parsing with z/of-string, it can't call z/find-last-by-pos until it has taken all of these variations into account.

In other words, this tests fails:

  (is (= ";; hello\n"
         (-> ";; hello\n(def hi :ciao)"
             (z/of-string {:track-position? true})
             (z/find-last-by-pos [1 1])
             z/string)))

Proposed Solution I'd like a new function z/of-string* that stays at the root :forms node. With this function, z/find-last-by-pos will reliably find any node based on cursor position, even in leading comments. I believe this new function can be a copy of-string that uses edn* instead of edn.

Alternative Solutions There are a few workarounds.

  1. (-> text z/of-string z/leftmost*). This will navigate either A) to the first node in the file, regardless of whether it was code or whitespace, or B) to the root node if the file was only whitespace. In either case z/find-last-by-pos can find any node.
  2. (let [loc (z/of-string text)] (or (z/up loc) loc)). This will navigate to the root :forms node, taking into account the possibility that you were already there. This is an awkward way to undo the navigation that z/edn does.

Alternatively, this can be solved in calling applications. See, https://github.com/clojure-lsp/clojure-lsp/issues/1252 for a proposal to do this for clojure-lsp.

Additional context This was the root cause of https://github.com/clojure-lsp/clojure-lsp/pull/1251, which has been temporarily patched in clojure-lsp.

Action I'd be happy to submit a PR for z/of-string*.

lread commented 2 years ago

Thanks for the issue @mainej. Is this a dupe of #125?

mainej commented 2 years ago

Yep @lread, I'd call it a dupe. Sorry, I should have searched first. Would you like me to close this one? Would you like me to add of-string* as another potential solution to #125?

lread commented 2 years ago

No problemo! Even though it is a dupe, it is clearly written and easy to understand. ❤️

I think we can close. My feeling is that an explicit option or a new top is probably clearer than an of-string*. Would you agree?

lread commented 2 years ago

Oh wait... I see what you were doing there. Maybe of-string* is worth considering? The raw non-whitespace skipping version of of-string.

lread commented 2 years ago

This is kinda sorta also related to #70.

mainej commented 2 years ago

The raw non-whitespace skipping version of of-string

Exactly... I was proposing a function like this:

(defn of-string*
  "Create and return zipper from all forms in Clojure/ClojureScript/EDN string `s`. Starts at the root `:forms` node.

  Optional `opts` can specify:
  - `:track-position?` set to `true` to enable ones-based row/column tracking, see [docs on position tracking](/doc/01-user-guide.adoc#position-tracking).
  - `:auto-resolve` specify a function to customize namespaced element auto-resolve behavior, see [docs on namespaced elements](/doc/01-user-guide.adoc#namespaced-elements)"
  ([s] (of-string s {}))
  ([s opts]
   (some-> s p/parse-string-all (edn* opts))))
mainej commented 2 years ago

This is kinda sorta also related to https://github.com/clj-commons/rewrite-clj/issues/70.

And very distantly to #171.

lread commented 2 years ago

Ah. You are matching the of-node* (aka deprecated edn*) naming convention/behaviour, yes?

Then for consistency, this proposal option would also include a new of-file*?

mainej commented 2 years ago

You are matching the of-node (aka edn) naming convention/behaviour, yes?

Yes, although my one hesitation is that I'm not sure of-node* and edn* are ideally named. I could imagine someone looking at their names and assuming that, based on the meaning of * in other contexts, they'd be navigated to the first child of :forms, whether or not it was whitespace. Being left at the :forms node might surprise them. But setting that concern aside, yes, that precedent was my motivation for the name.

Then for consistency, this proposal option would also include a new of-file*

Yes, that would make sense. As with any library that has lots of composable functions, any "shortcut" quickly leads to a Cambrian explosion.

lread commented 2 years ago

The nice thing about the * is that it does imply a special behaviour. And if you are familiar with the other * fns you will assume it is about not skipping nodes. So those are good things.

If we took the options approach instead:

Meh. I think we have an existing convention. My feeling is trying to correct history (that might not be even terribly flawed) is more confusing than going with the flow. We already have a kitchen sink of functions, what's 2 more?

If we added these functions, we could also revise (-> (of-string..) z/up ...) advice for fns like postwalk. The advice would be to use of-string* instead, I think.

So yeah, I'd be fine with new of-string* and of-file* that do no automatic node navigation (matching the current behaviour of of-node*).

If we ever get to #70, it would only make sense for the non-starred versions (of-node of-string of-file).

Let's see if @borkdude can find any flaws/gotchas with this approach.

borkdude commented 2 years ago

I agree with the original proposed solution of-string*.

lread commented 2 years ago

Thanks for the review @borkdude, we'll also add of-file* for consistency.

@mainej, I can do this one if you like, I'd like to see how it affects docs and docstrings.

mainej commented 2 years ago

@mainej, I can do this one if you like

Sounds good @lread! I had started a commit but the docstring wording would be the biggest change, so best for you to get that how you want it. Thanks!

lread commented 2 years ago

Am looking into this and have been reminded, by the code, of a historical nuance. A bit on the nitty gritty side, but writing it down anyway, could be that future-me will appreciate it.

of-node

of-node* does neither of these.

I remember not liking the fact that of-node* did not do the wrapping too. But kept the behaviour in for backward compatibility.

Because of-string* and of-file* are not working at the node level, they will always wrap with :forms node.

Edit: this is nitty grittier than I remembered: parse-string-all and parse-file-all both return a :forms node which are used by of-string and of-file. So okay. This makes this maybe almost a :duck: comment?