Indentation - Githubissues

clj-commons / formatter

Building blocks and discussion for building a common Clojure code formatter

36 stars 1 forks source link

Indentation #9

Open PEZ opened 5 years ago

PEZ commented 5 years ago

I, for one, start to think about indentation first thing when formatting is mentioned. So therefore this thread on that issue. As the plethora of zprint indentation options show us there are a plethora of ways people indent Clojure code.

PEZ commented 5 years ago

My vote is on the indentation suggestions of @tonsky's Better Clojure Formatting. They are truly simple and can survive just about any development of the Clojure language:

Multi-line lists that start with a symbol are always indented with two spaces,
Other multi-line lists, vectors, maps and sets are aligned one space after the open delimiter.

Note: I updated the second rule to disambiguate it, as per @shaunlebron's comment below.

Pasting the examples from the article here.

(when something
  body)

(defn f [x]
  body)

(defn f
  [x]
  body)

(defn many-args [a b c
                 d e f]
   body)

(defn multi-arity
  ([x]
   body)
  ([x y]
   body))

(let [x 1
      y 2]
  body)

[1 2 3
 4 5 6]

{:key-1 v1
 :key-2 v2}

#{a b c
  d e f}

(or (condition-a)
  (condition-b))

(filter even?
  (range 1 10))

(clojure.core/filter even?
  (range 1 10))

(filter
  even?
  (range 1 10))

Another good reason to keep the rules as simple as this is that it will be a lot less to learn for newcomers. They can concentrate on learning the idioms of the language instead of idioms of indentation style. In fact I think most of us will benefit from the easier parsing of the code when we always know what indentation to expect.

A thumbs up on this comment means you vote for these suggestions. (I recommend reading the article before weighting in.)

zane commented 5 years ago

While I admire the spirit behind this effort I am against this particular proposal for the simple reason that code indented based on these rules conveys less information than code indented based on the rules in the Clojure Style Guide.

Let me explain.

Lisp language communities have traditionally embraced semantic indentation (indenting subforms differently depending on the symbol in the operator position) because it allows the reader to derive information about a form even if they don't recognize the symbol in the operator position.

For example, if the reader instead encounters a form where the operands after the first have been given visual prominence by being dedented, like so

(foo bar
  baz ; no longer vertically aligned with bar
  qux)

then the reader can assume that foo is a macro with body parameters like when, with-out-str, etc.

If the reader encounters a form where all the operands are aligned

(foo bar
     baz
     qux)

then the reader can assume that the form is not a macro with body parameters.

Without this visual cue the burden of differentiating between these two cases falls on the reader.

Admittedly implementing semantic indentation is harder than implementing the proposal outlined above, but it is effort well spent. It is an investment that will pay dividends every time code indented by that formatter is read.

zane commented 5 years ago

Additionally, vertically aligning function arguments spanning multiple lines has the added benefit of being easy to scan. A reader looking at one argument to a function need only move their eyes up or down within the same column to find the other arguments to that function.

Similarly a programmer with their cursor at the start of the first argument to a function need only move their cursor down to find the rest of the arguments to that function. This is important if one wants to use keyboard macros to record and replay a sequence of actions on each of the arguments.

claudiu-apetrei commented 5 years ago

Seems like it's already supported by editors https://twitter.com/CursiveIDE/status/1070984513163997189

Configured cursive that way and have been trying it out on my code. So far so good, actually wish I knew about this option sooner... been fighting with Cursive's macros indentation (resolve-as) every time I start a project, now it's just indented the way I want it to without any config.

Personally really like having fewer & simpler formatting rules than the current clojure-style-guide even if it means sacrificing semantic indentation hints and preferences (my eyes are still adjusting to the new way threading macros are indented).

Edit https://www.braveclojure.com/do-things/

Syntax Clojure’s syntax is simple. Like all Lisps, it employs a uniform structure, a handful of special operators, and a constant supply of parentheses delivered from the parenthesis mines hidden beneath the Massachusetts Institute of Technology, where Lisp was born.

For such a simple language feels a bit strange to have so many indentation rules to remember. The new proposal seems like it's a bit more in tune with how clojure syntax is presented.

shaunlebron commented 5 years ago

@zane wow, I didn't know indentation was a shortcut for differentiating functions and macros. Would the following be a decent compromise?

;; ✅two-space indentation
;; ✅child arguments are aligned => foo is function
(foo
  bar
  baz
  qux)

;; ✅two-space indentation
;; ✅first child separated => foo is macro
(foo bar
  baz
  qux)

This indentation system is still flexible in that it gives you full control of newline characters. I think that's probably the sanest way to create an indentation formatter without incurring the cost of including a semantic analyzer.

shaunlebron commented 5 years ago

Other multi-line lists, vectors, maps and sets are aligned with the first element (1 or 2 spaces).

Possibly ambiguous, since this would pass the check (not intended right?):

[ 1 2 3
  4 5 6] ;; <-- two space indentation to align with first element

I think the rule can be rephrased as "one space after the open delimiter", so it includes:

[1 2 3
 4 5 6]  ;; <-- one space indentation after the open delim "["

#{1 2 3
  4 5 6} ;; <-- one space indentation after the open delim "{"

Also works for delims with longer prefixes:

#?@(:clj [...]
    :cljs [...])

#queue[foo
       bar]

zane commented 5 years ago

@shaunlebron It's a little more subtle than differentiating between macros and functions. The rules in the Clojure Style Guide help readers determine whether or not the symbol in the operator position is a macro that has body parameters. I've updated my comment above to try to make that more clear.

For example, when has its body dedented:

(when (even? n)
  (println n)
  (inc n))

Similarly with-out-str also has its body dedented:

(with-out-str
  (println "foo")
  (println "bar"))

However, or does not even though it is a macro:

(or (foo? n)
    (bar? n)
    (= n 7))

zane commented 5 years ago

To take a step back, there's a reason why auto-formatters like gofmt are less popular in Lisp-like languages: In a constrained language like golang it's easy to enumerate all the "special forms" upfront and choose the best indentation rule for each. In Clojure however users can extend the language via macros, making the total set of "special forms" (and the number of different kinds of special forms) unbounded.

To me this raises the question of whether a one-size-fits-all indentation rule is even desirable. I certainly can't imagine all the different kinds of macros people might write, nor can I imagine all the kinds of information a macro writer might want to convey to people reading code that uses their macro. This is why (I presume) Cider cedes authority on the matter of indentation to the macro writer via indent specs.

arichiardi commented 5 years ago

Agree to the point that the rules above are simple therefore good. I also agree to the points @zane is making.

I guess if the goal of this issue is to establish "sensible defaultz" then probably something along the lines of the two rules might be good.

However I do not see myself using only those, realistically - that is why I really liked the approach that zprint has in defining "format like this" config options + the tons of options. For reference the tslint tool has way way too many options in comparison and folks have developed presets. I do not see it as a problem personally.

So maybe my question here would be, what would be the outcome of this vote?

shaunlebron commented 5 years ago

@zane great description of the problem, thanks for clarifying.

Interesting precedent: This made me write about how elm-format inspired prettier to let the user nudge the formatter for cases too ambiguous to automate. Thus, proposing:

Rule 3 Proposal

3. multi-line lists are indented to "first-arg alignment" if and only if the 2nd line is indented as such

✅ Triggered:

;; before
(foo bar
     baz             ;; <-- 2nd line is "first-arg aligned"
  qux)

;; after (rule 3 triggered)
(foo bar
     baz
     qux)

❌ Not triggered (falling back to two-space indentation from Rule 1)

;; before
(foo bar
 baz                ;; <-- 2nd line is NOT "first-arg aligned"
  qux)

;; after (rule 3 NOT triggered)
(foo bar
  baz
  qux)

Takeaways:

provides strict formatting for both legitimate indentation choices
specifies a well-defined nudge to let user pick (without explicit indent spec or analysis)

Since there's precedent for doing something like this in popular formatters, this might help break Lisp out of its unique indentation paralysis.

tonsky commented 5 years ago

Semantic indentation is actually quite dangerous. Because it will make mistakes (yet unknown forms, lib author hasn’t provided indentation rules etc) and it will convey incorrect meaning. To me this is no-go. I prefer something with less info but reliable 100% time rather than something with more info but 10% chance it might be incorrect.

The only solution I see here is to let code author decide indentation as she writes code (~Rule 3 proposed by @shaunlebron). But do you really need that info? If you see when or or form do you really need indentation to help you figure out what they are?

I also have an argument against aligning arguments to the first one. Everything becomes too indented too quickly. Function names are usually not as short as foo/or/and. And they often include namespaces. If I have something like

(utils/format-paragraph p
                        out
                        opts)

everything is suddenly indented 24 spaces in, even though nesting level has just increased by 1.

Indentation ~ nesting level is another benefit coming from my proposal. I believe it makes code easier to read (you can approximate nesting just from a quick glance over indentation). It also makes code easier to format, as everything becomes much less spread horizontally.

tonsky commented 5 years ago

Just a reminder: we are not inventing just another formatter here. We are trying to find a solution that would work everywhere for everyone. I believe it makes relying on runtime information (like indentation rules specified on forms) a no-go

shaunlebron commented 5 years ago

worth making this clear:

Rule 3's relation to clojure-mode:

@bbatsov's clojure-indent-style specifies three different styles, and branches further based on the placement of bar in examples below:

:always-align

(foo bar              ;; <-- formatted based on `bar` on 1st line
     baz)             ;;     (outcome A)

(foo
 bar                  ;; <-- formatted based on `bar` on 2nd line
 baz)                 ;;     (outcome B)

:always-indent

(foo bar              ;; <-- formatted based on `bar` on 1st line
  baz)                ;;     (outcome C)

(foo
  bar                 ;; <-- formatted based on `bar` on 2nd line
  baz)                ;;     (outcome D)

:align-arguments

(foo bar              ;; <-- formatted based on `bar` on 1st line
     baz)             ;;     (outcome A)

(foo
  bar                 ;; <-- formatted based on `bar` on 2nd line
  baz)                ;;     (outcome D)

Rule 3 allows outcomes A,C,D but not B.

kkinnear commented 5 years ago

In order to explore the ideas expressed by @tonsky, I hacked up zprint to do what I think he proposed in his blog post. I may, of course, have misinterpreted what he had in mind. I have put the resulting build artifacts in a release zprint-std-0.1.1

There are two graalvm (i.e. fast) binaries there, one for macOS and one for Linux, and an uberjar for everyone else. These things read from stdin and write to stdout.

Note that these are experimental, with almost zero testing. These are not production releases, these are prototypes to play with.

I did this for several reasons. In part to see how hard it would be (not very, since these are simple rules), and in part so that I could see what the results looked like beyond the trivial examples we all tend to use.

Do I think that this is the formatting approach that should be chosen? I'm mixed. It is future proof. I don't expect I'd be formatting my code this way anytime soon, as (unsurprisingly) I like what zprint does by default. That said, I ran this over all of clojure.core looking for some functions that would really look bad. I expected to find plenty, and I found only three or four that looked a bit bad. The only things that I think it seriously fails on are specs. But then pretty much anything without the zprint-style constant pairing is going to fail for specs.

If the code started out looking good, this approach mostly doesn't make it look terrible. If it started out looking terrible, this formatting approach will make it some better, perhaps enough better?

Try it on your code. See what you think.

Edit:

I managed to forget to work through how zprint's current comment handling interacts with the code I added for trying out @tonsky's formatting approach. If you have problems with comments messing up the output, try this, it should help:

./zprintm-std-0.1.1 '{:comment {:inline? false :wrap? false}}' <infile.clj >outfile.clj

If that doesn't fix things, please add to the issues in the zprint-std repo.

shaunlebron commented 5 years ago

@kkinnear is there code available for that?

i had a thought to fork parinfer's code to do this, but only if the rules stay simple (it can only handle indentation correction). small code size and annotated markdown test cases would probably help prototyping

arichiardi commented 5 years ago

@kkinnear thanks you are amazing! Trying that!

kkinnear commented 5 years ago

@shaunlebron, I have implemented this by hacking zprint to do this rather lightweight formatting approach. The code I wrote hijacks the existing zprint processing by calling a central routine (see below) which does the basic implementation for this formatting approach. I modified the code which does maps, lists, and vectors to call this routine instead of doing what they normally do.

I could push the branch I did this on if you think it would help you. Two reasons I haven't already done that:

Having done this work, were I to support and release it for general use, I would probably do it somewhat differently.
I didn't imagine that it would be all that useful for anyone else, since zprint isn't a particularly inviting prototyping environment, other than for me.

Doing these simple rules was sufficiently different than the other zprint implementation that I could just sort of parallel the existing implementation. Some of the more complex rules you are proposing would be a bigger challenge, since they need to know more. That doesn't mean they aren't a good idea, just that implementing them would be something to figure out.

This is the routine that is used for lists, vectors, sets, and maps. It returns the insides of the data structure, and the caller wraps the "(" and ")", or "[" and "]", or whatever around the output from this routine. Just FYI, the "fzprint" means "focused zipper print", since a feature of zprint that isn't widely used is that it will highlight or other special case an expression which is the "focus" of the output.

(defn fzprint-std
  "zloc is down inside a collection, operate on it and everything else 
  in the collection. Don't ignore any whitespace except after a newline, 
  when we will adjust the indent. ind is where we are on the line at this 
  time, and ind+indent is where the next line should go under this one."
  [options ind indent zloc]
  (dbg options "fzprint-std:" (zstring zloc))
  (loop [nloc zloc
         current-ind ind
         newline? false
         out [[]]]
    (let [#_#_nlen (count (zstring nloc))
          next-element (fzprint* options current-ind nloc)
          next-element (if (and newline?
                                (= (nth (last next-element) 2) :whitespace))
                         (conj (butlast next-element) ["" :none :whitespace])
                         next-element)
          [newline? next-len] (last-line-length next-element)
          next-element (if newline?
                         (conj (into [] (butlast next-element))
                               (let [element (last next-element)]
                                 ; rewrite the newline with proper indent
                                 [(str (trimr-blanks (first element))    
                                       (blanks (+ ind indent))) (second element)
                                  (nth element 2)]))
                         next-element)
          next-out (concat-no-nil out next-element)
          next-nloc (znext nloc)]     
      (if next-nloc
        (recur next-nloc
               (if newline? (+ ind indent) (+ current-ind next-len))
               newline?
               next-out)
        (do (dbg options "fzprint-std: return:" (pr-str (next next-out)))
            (next next-out))))))

If you want, I could push a branch of zprint which does the zprint-std processing. It would include that above code, a couple of other new functions, and changes to about 4-5 places in the existing zprint. It has no tests of any sort.

shaunlebron commented 5 years ago

Published parindent—parinfer repurposed to prototype these formatter rules:

👉 https://github.com/shaunlebron/parindent

Diffs from popular repos:

👀Things I noticed:

(:require ...) should allow arg-alignment or two-space indentation?
[:div ...] hiccups are often two-space indented
lot of lists want one-space indentation
lots of multi-arity function bodies are indented past the parameter vector

bbatsov commented 5 years ago

I don't want to repeat myself here, so I'll just link to https://clojureverse.org/t/clj-commons-building-a-formatter-like-gofmt-for-clojure/3240/85

TLDR;

Before jumping the gun on some universal indentation scheme I think it'd be beneficial to think:

What exactly should it be?
Is this ever going to fly?

Generally the farther you aim from what currently exists as community standards, the harder it's going to be to make something be widely adopted.

vemv commented 5 years ago

Hi there, I read @tonsky's http://tonsky.me/blog/clojurefmt which is having an influence in this thread, and I would like contributing with a constructive counterview.

Clojurefmt has to be everywhere. Any editor. Any language. Stand-alone tool. Browser. Libraries. If we want everyone to use it, we should give it to everyone. Nothing sucks more than “it works in Emacs but everybody else needs to figure out a way to run it”. Or “it’s written in Clojure so it takes 10 seconds to start up, practically unusable”.

This concern doesn't preclude writing the tooling in JVM Clojure.

graalvm exists
stuartsierra/component exists (you can write a dev component that reformats code on resets)
wkf/hawk exists (if you feel like a faster formatting cycle)
Editors and the Clojure formatter could communicate directly via nNREPL or such.

Browser.

What's the use case for formatting clj code (not data) in the browser? Isn't it more like a misguided goal (lead to by a desire of uniformity/simplicity)? One that could impact the quality of the rest of the solution (less accuracy, or more compromises and breakage from Lisp heritage)?

For example, I’d be happy if I could express the rules of clojurefmtin VS Code Auto Indent syntax

This is a misguided wish/goal and clearly it influenced the rest of your thinking. Something that can expressed in VS Code Auto Indent syntax obviously will be less powerful than something that runs in Clojure, and preferrably in runtime.

Comparing Clojure to Go is a mistake. Clojure has macros and other reasons why you can't have a 'dumb' instantaneous formatter that cold-starts. Like other things in Clojure (spec!), the right way is to embrace the runtime. Compare CIDER to Cursive for example - while both have unique strengths, CIDER is overall superior because it embraced the runtime. It's common for Cursive users to have unresolved vars, spurious warnings, unfixably bad formatting etc.

I have witnessed how Cursive users end up making up rules to compensate these, that really don't relate how Clojure/Lisp is written in the larger ecosystem. Would be a bad fate for clj-commons/formatter.

How would we know if the form has body params or not? Whitelisting doesn’t work. Parsing codebase to find macro source doesn’t either.

Whitelisting doesn’t work. -> It works, it's handy for when one is editing Clojure without an nREPL connection. Sure you won't get accurate support for every macro ever, but most macros come from clojure.core (80-20% rule). If not, one could use inference/conservatism: if something had a certain indentation and the formatter lacks necessary information, it could assume the indentation was right, and therefore not touch it.

parsing codebase to find macro source doesn’t either. -> It works fantastically well if you embrace the runtime. Parsing without running code -> won't work indeed, but that's not a good goal for a Lisp. Furthermore, indentation metadata for macros will help.

[on Vertically align function (macro) arguments spanning multiple lines.] This rule would be easy to automate if it didn’t have a huge list of exceptions.

You aren't distinguishing between functions and macros, which makes me believe you didn't understand this rule. That way it sure will look like an exception-ridden one.

Again, exceptions! E.g. try doesn’t follow it.

You are comparing functions with special forms. As @zane points out https://github.com/clj-commons/formatter/issues/9#issuecomment-445785753 , once you understand the distinction, you'll find that this style helps programmers understanding the code at a glance.

Comment on @tonsky's contributions to this thread:

Because it will make mistakes (yet unknown forms, lib author hasn’t provided indentation rules etc) and it will convey incorrect meaning.

Not if you embrace the runtime :) also, consumers could perfectly declare metadata for the case third-parties didn't author them for their macros. And they could be kept in a global repo a la DefinitelyTyped, for the edge cases of macro authors who won't update metadata.

Everything becomes too indented too quickly.

But it also is more readable. In:

(utils/format-paragraph p
                        out
                        opts)

You read the arguments vertically, which is relaxing for your eyes. Without vertical aligning, a huge zig-zag between p and out is created. Try tracking the movement of your eyes as you read the arguments. It's sort of an interruption to one's flow of thought (to put it another way: I don't want to read code as p [20 spaces] out opts, with this gap in between where my eyes are moving and my mind is wondering what the next argument will be).

tldr

It should be recognised that a Clojure formatter has to make compromises: it will be either smart or unconditionally-quick. I'd go for the smart route, and optimize things starting from there, which is perfectly possible.

I have built a hawk-based formatter in the past (closed-source sadly), which proves at least one way of giving every editor user the same formatting, instantly.

tonsky commented 5 years ago

Embracing runtime is a worst thing that can happen to a tool like this.

I have to pay 30-60 second startup cost just to format a file? No way. I wasn’t happy with running cljfmt as a standalone tool too, as it still required couple of seconds for each invocation.

Or if I work in a REPL already but broke my runtime somehow in a completely different place so that tools.namespace failed to load anything and now I can’t format the file I’m currently work on? What’s the appeal of that?

And what about incomplete files? I want the formatting to be right as I type, not after I finished. At this point forms are not complete yet, so no runtime can help me with that.

80-20% rule is a terrible goal to meet. What’s the point of universal formatter if it formats wrong/different for different people 20% of the time?

That’s why I wanted the tool to be easy to implement anywhere, not only in JVM environment. Browser (Atom, VS Code), Python (Sublime Text), JVM (Cursive), ELisp (Emacs) etc.

vemv commented 5 years ago

I have to pay 30-60 second startup cost just to format a file? No way.

This is a strawman because our main activity as developers is to develop, not to format. I typically will pay the 30-60s start time and get in exchange:

A clj repl
A cljs repl
A test runner
An autoformatter
A refactoring backend (clj-refactor)
A SCSS compiler
And more...

When well set up, these components will perform instantly. Sadly a lot of people are stuck into thinking that lein whatever is the only way to do things, but in fact you can decomplect tooling from Leiningen. I did so this year in a few places, e.g. https://github.com/weavejester/cljfmt/pull/123 .

Worth adding, it is extremely common between Clojurists and Lispers to have REPL sessions that last days (particularly Emacs sessions). So the 30-60s cost becomes even more relative.

I wasn’t happy with running cljfmt as a standalone tool too, as it still required couple of seconds for each invocation.

Take a look at integrating it with your main JVM process, as advocated above. Performance should be instantaneous.

Or if I work in a REPL already but broke my runtime somehow in a completely different place so that tools.namespace failed to load anything and now I can’t format the file I’m currently work on?

If your project's state broke, then you have a more important problem to solve first. After solving the prioritary problem, code will be autoformatted with no intervention needed.

And what about incomplete files? I want the formatting to be right as I type, not after I finished. At this point forms are not complete yet, so no runtime can help me with that.

As briefly outlined, editors can communicate with a backend via nREPL. i.e. Atom can ask the JVM "hey, what's the correct formatting for this sexpr which I'll pass you on-the-fly?"

What’s the point of universal formatter if it formats wrong/different for different people 20% of the time?

I was explaining the case for nREPL disconnected, which is a corner one.

Browser (Atom, VS Code), Python (Sublime Text), JVM (Cursive), ELisp (Emacs) etc.

All those can (and do?) communicate with nREPL.

danielcompton commented 5 years ago

@vemv, thanks for your thoughts here. As a high level comment, I think that beyond the specific details, there is simply a matter of different sets of values clashing. It doesn't mean that either is wrong, just that different people have different values and goals when formatting their code.

What's the use case for formatting clj code (not data) in the browser? Isn't it more like a misguided goal (lead to by a desire of uniformity/simplicity)?

The browser/JavaScript has an extremely wide reach, far wider than the JVM can ever get to. One example of formatting code in the browser is re-frame-10x. We get hold of code forms via macros, and then show the execution of code through that code in the browser.

Another use for having JS compatibility is the large number of JS based code editors, both as part of desktop applications like Atom and VS Code, and on the web like Nightcoders.

Reasonable people can disagree on the value of these things, and whether they are worth the tradeoffs that having JS compatibility would involve, but there are concrete use-cases.

For example, I’d be happy if I could express the rules of clojurefmt in VS Code Auto Indent syntax This is a misguided wish/goal and clearly it influenced the rest of your thinking. Something that can expressed in VS Code Auto Indent syntax obviously will be less powerful than something that runs in Clojure, and preferrably in runtime.

I think misguided is probably not the best word to describe this. I think there is a difference of values, and what Nikita values here is different to what you are valuing. Again, that's totally fine, but let's recognise that.

From @tonsky: And what about incomplete files? I want the formatting to be right as I type, not after I finished. At this point forms are not complete yet, so no runtime can help me with that.

This is a very concrete use-case of how being able to express the formatting spec in VS Code Auto Indent rules would be beneficial.

When designing a formatting scheme, there is a wide range of possibilities in how much context and understanding of the codebase that the formatter requires. The advantages of Nikita's proposal is that it doesn't require any understanding at all of the codebase beyond the current top-level form. This means that it is very easy to create multiple implementations targeting different tooling. The tradeoff here is the limited palette available for formatting the code.

One that could impact the quality of the rest of the solution (less accuracy, or more compromises and breakage from Lisp heritage)?

On this point, I think it's worth looking at Clojure itself and how it has in some places broken from it's Lisp heritage, e.g. using [] in places where lists would have been used, function names (car, cdr), and many other things.

Clojure is part of the Lisp heritage, but I don't think that it is constrained by it. It is certainly very valuable to understand the Lisp heritage that we come from, as there are many hard-won lessons there. However just as Clojure wasn't constrained by what came before it, nor does this formatting tool have to be constrained by what came before it. However we do need to recognise @bbatsov's very good point, and avoid differences just for difference sake, and recognise the costs of deviating from the norm:

Generally the farther you aim from what currently exists as community standards, the harder it's going to be to make something be widely adopted.

Sure you won't get accurate support for every macro ever, but most macros come from clojure.core (80-20% rule). If not, one could use inference/conservatism: if something had a certain indentation and the formatter lacks necessary information, it could assume the indentation was right, and therefore not touch it.

Two goals I had when I proposed this formatting project were:

to build a tool/formatting spec that can always give repeatable results, where the formatting of code doesn't change depending on which computer/tool/environment you are running on. You could achieve that goal in many ways, including forcing to always connect to a running REPL.
to build a tool/formatting spec that can give canonical results, where the formatting of the code can be as consistent as possible, and not affected by the indentation of the programmer, except where that indentation serves the purpose of hinting how the formatter should run.

For example, given the rule "if the formatter lacks necessary information, it could assume the indentation was right, and therefore not touch it." then in the case of unknown macro formatting rules, both of these formats would not be touched. This would go against goal 2. Perhaps the macro indentation spec could resolve this ambiguity though.

(unknown true
  (xyz))

(unknown true
         (xyz))

This is a strawman because our main activity as developers is to develop, not to format.

It's not a straw man, it's a difference in values. There are also many contexts where a running REPL isn't available or possible, or the cost is too high.

I was explaining the case for nREPL disconnected, which is a corner one.

This may be a corner case in your usage, but it's not a corner case for everyone. I often need to jump into Clojure projects to make a quick fix or start sketching out some code without wanting to have to wait to start up a REPL. There are also many newer Clojure programmers who haven't learnt how interactively develop at the REPL.

In writing this reply, I realised there is another goal I had which I hadn't explicitly mentioned or identified:

I would like this tool to be usable for beginner Clojure programmers.

Intermediate and expert Clojure developers are able to setup a finely tuned REPL environment where everything runs in the same JVM process and they can achieve a very productive state. Clojure's ability to develop interactively is one of its strongest features. However for a beginner to Clojure, this is too much to ask. A tool which requires a high startup time and to load all of the code just to format it is a big barrier to those novice users.

I believe one of the current and future changes in programming culture is the adoption of consistent formatters like gofmt and prettier. Programmers new to Clojure are going to be looking for a tool like this, and if the only story we have is one that involves significant friction or adopting unfamiliar tools like Emacs then that is not going to be a compelling one.

@vemv I can see that you care deeply about this proposal. It's really useful having you here, giving your opinions and ideas. I don't have a big Lisp background, so I'm not as familiar with some of the background on formatting as you are. I would ask though, that you make arguments against other suggestions with the most charitable view, i.e. the principle of charity, and avoid words like 'misguided'. I'd like to be able to work together as a community to build something that as many people as possible can enjoy and use. To do so, we need to ensure that the process of building this tool stays positive and constructive.

danielcompton commented 5 years ago

Published parindent—parinfer repurposed to prototype these formatter rules:

👀Things I noticed:

(:require ...) should allow arg-alignment or two-space indentation?

[:div ...] hiccups are often two-space indented

lot of lists want one-space indentation

lots of multi-arity function bodies are indented past the parameter vector

Apart from the indentation on (:require ...), this all looks pretty reasonable. Even there, one option could be to suggest breaking the list so required namespaces start on the line below. Certainly there are some differences, but after a quick scan through the diffs it all looked ok. I noticed a lot of Figwheel's top level forms were indented with four spaces, e.g. https://gist.github.com/shaunlebron/791da3a0f8f1ce66e033ab74c6743070#file-figwheel-diff-L7159-L7233. It looks like this is because of the reader conditionals wrapping the rest of the file. I'm not sure how common a case this is, though you could make an argument that indenting it is better anyway as it makes it clearer to the reader that something different is happening here.

I appreciated that whitespace between forms was preserved. While I usually don't bother with this kind of thing myself, I know that others find it really useful: https://gist.github.com/shaunlebron/1560623142bdd4842f6c714bc2518936#file-rum-diff-L71-L84. I'm not sure whether this would conflict with other formatting goals, but it is a neat thing to be able to offer.

vemv commented 5 years ago

@danielcompton Thank you for your thoughts and moderation.

The browser/JavaScript has an extremely wide reach, far wider than the JVM can ever get to.

Got your examples, sounds good. One still could build a cljs-backed formatter that isn't instantaneous (but rather, "runtime-embracing" as I call it), making the tradeoff of 'smartness' (for lack of a better term) for speed.

My point wasn't about the JVM, it's about building stuff that can understand our code. A formatter that cold-starts and does its job under 1 second cannot be possibly smart, and there will be sacrifices. Correct?

I think misguided is probably not the best word to describe this.

I'll just briefly note that Nikita uses a vocabulary including terms such as worst, no way, terrible, stupidest etc.

Also he expressed in his blog post stuff that was misleading or ignorant. I already backed that by argumentation (no function-macro-specialform distinction from his side).

There's also the difference that he has pretty high clout. There's a risk when influential people express highly subjective, breaking things - we can end up adopting the wrong thing, with a lot of pain in the way (related: every hype cycle ever. BigCorp pushes tech x, ideas are not that polished, stuff turns out to be inflexible and ends up abandoned).

Anecdotically, once at work we actually followed http://tonsky.me/blog/readable-clojure . While mostly OK, the guide has some rough/unconventional edges, and wouldn't want to encounter the same sort of thing again.

The tradeoff here is the limited palette available for formatting the code.

We agree here.

There are also many contexts where a running REPL isn't available or possible, or the cost is too high.

And who decides when the cost is too high compared to the cost of using an oversimplified, breaking styleguide? As you say there's a difference in values. It would be somewhat unfair (best-case), or unsuccessful (worst-case) that a handful of high-clout folks decided what the right balance is.

The "zero-conf" goal makes things even harder.

There's a possible successful path which starts by creating smart tools that evaluate (or parse, worst-case) and understand our code. Could be backed by the JVM or Node, doesn't matter that much. I don't think it would conflict with your goals - only perhaps the tooling would be different to something like gofmt. Might require more effort as well.

I often need to jump into Clojure projects to make a quick fix or start sketching out some code without wanting to have to wait to start up a REPL.

I think Rich makes a series of good warnings (at least in 3 talks) against tooling infatuation and addiction to quick, easy feedback.

Sometimes fixing things will take me 5 mins instead of 1. Or my formatting will be slightly broken for a few commits. Are we sure we aren't entering OCD territory? At what cost? (OCD-y dev here, no attack intended)

There are also many newer Clojure programmers who haven't learnt how interactively develop at the REPL.

Same here, Rich points out that tools might take some time learning and investment. Nobody learns the violin overnight. That's not to say one should go for elite Emacs tools or such. Both Cursive and Spacemacs make a lot of things reasonably easy.

Should we lower the bar for everyone else just for accomodating a (somewhat hypothetical) beginner persona?

I'm aware that this exchange was a bit emotionally charged. No drama intended. I just seek to contribute, with somewhat qualified design suggestions. There's definitely a need and a chance for better formatting, but I'd kindly invite you to frame it as something to be deeply community-driven.

Perhaps such effort will take 1-2 years, not unlike some of the best tooling that we enjoy today in Clojure land!

danielcompton commented 5 years ago

I'll just briefly note that Nikita uses a vocabulary including terms such as worst, no way, terrible, stupidest etc.

I can see what you're saying here and take your point. I don't want to get too into the weeds on tone and moderation, but the reason why I mentioned 'misguided' was that word was talking about people rather than ideas. What you were saying wasn't a big issue, but I wanted to steer the conversation back towards more positive terms. For clarity, I'd like everyone here to try and stay positive when we disagree on things, this is a topic where lots of people have lots of strong opinions and that's great. We just need to make sure that we can work together to come to the best solution.

There's also the difference that he has pretty high clout. There's a risk when influential people express highly subjective, breaking things - we can end up adopting the wrong thing, with a lot of pain in the way

Nobody's ideas should have any more weight than anyone else based on how well known they are in the community. We need to evaluate all of the ideas on their own merits.

And who decides when the cost is too high compared to the cost of using an oversimplified, breaking styleguide? As you say there's a difference in values. It would be somewhat unfair (best-case), or unsuccessful (worst-case) that a handful of high-clout folks decided what the right balance is.

To be clear, no decision has been made on anything yet. This issue tracker along with the discussion on ClojureVerse is the process of deciding on values and goals. There shouldn't be a handful of high-clout folks deciding what the right balance is, as you rightly say, that is unlikely to lead to a great outcome.

I think Rich makes a series of good warnings (at least in 3 talks) against tooling infatuation and addiction to quick, easy feedback. Should we lower the bar for everyone else just for accommodating a (somewhat hypothetical) beginner persona?

This is a good question and one we need to wrestle with. I didn't mean to say that my perspective was the correct one, or that we should favour one-off code fixes over people working for eight hours a day, but to bring up some more contexts where formatting matters. It's possible that we weigh the different tradeoffs and end up in a place where the beginner experience is less than ideal in exchange for a more powerful expert experience. But first, we need to weigh the pros and cons for the different stakeholders involved. I don't think (and I don't think you think this either) that everything we do to make things easier for beginners necessarily makes things less powerful for experts. Colin Fleming has had a few good comments on Reddit about improving the beginner experience of Clojure, and how it doesn't need to come at the expense of expert users.

I'm aware that this exchange was a bit emotionally charged. No drama intended.

And none taken 😄

There's definitely a need and a chance for better formatting, but I'd kindly invite you to frame it as something to be deeply community-driven.

Perhaps such effort will take 1-2 years, not unlike some of the best tooling that we enjoy today in Clojure land!

Absolutely, I'll make sure that it's clear that this is a community driven project (like all of CLJ Commons). I'm in no hurry for this tool to be created right now, I agree that it's more important to spend extra time on this, rather than rushing ahead.

shaunlebron commented 5 years ago

Been trying relaxed rules to allow 1-space, 2-space or Arg-alignment (chosen by first sibling line).

Made a live page for trying these out (edit on left, formatted output on right): https://shaunlebron.github.io/parindent/

kkinnear commented 5 years ago

@shaunlebron, thanks for parindent. It is interesting to see how you interpreted the rules! I have a suggestion and a question.

I'm a little confused by the rules in your repo, since they don't mention the bit about 2 space indents when the first element of a list is a symbol. But it seems that you have implemented that in parindent. My suggestion was to broaden that to include keywords in addition to symbols, since they are also functions. When I tried parindent, it wouldn't seem to allow arg alignment to be chosen by the first sibling line when the first thing in the list was a keyword.

From playing a bit with parindent, I believe that it doesn't actually change the number of spaces except for those spaces after a new-line and before the first non-whitespace character on the line. Is that true? That isn't a judgement or anything -- I'm just trying to figure out what the rules really mean. Thanks!

shaunlebron commented 5 years ago

@kkinnear thanks for your comments:

Yes, symbol detection was removed for now, since two-space and arg-alignment weren't particular to symbols—as you said, it's sometimes desired for keywords (:require or hiccups like :div). Rather than extending to just keywords, I decided to extend it to everything else in case we missed other cases (and dial it back after more info is gathered)
Yes, only the "indentation" spaces are modified (between newline and non-whitespace). It's just the most basic, forgiving formatter that only cares about this particular GitHub issue topic (indentation). Just an ad-hoc tool to facilitate discussion though—and prepared to throw it away later.

shaunlebron commented 5 years ago

To move discussion, if oversimplified rules are too restrictive, and runtime analysis is too heavy etc, what do people think if oversimplified rules are more forgiving by allowing 1-space or 2-space or arg-alignment?

Specifically, do people like the idea of an indentation formatter that allows all the following?

;; one space
(foo bar baz
 qux)

;; two space
(foo bar baz
  qux)

;; first arg
(foo bar baz
     qux)

;; second arg
(foo bar baz
         qux)

This almost seems completely useless as a formatter to allow all of these, but it's probably important to fail left and fail right when figuring out what this should do. If the original rules are too restrictive, and these are too forgiving, then at least we know where the goal posts are.

edit: sorry if my bowing out is rude, but I think I have contributed all that I can offer to this discussion. my thoughts here are informed by years of thinking about parinfer, and i'm certain that there is a good solution somewhere in the mix, given the collective wisdom from all corners of clojure here to solve this interesting problem. cheers

markx commented 4 years ago

Before we finish this discussion, if we ever get there, is there currently a way to use cljfmt or zprint or something else to achieve tonsky's style?

kkinnear commented 4 years ago

I am in the later stages of enhancing zprint to support (as a high level ":style") what I perceive to be @tonsky 's style with the inclusion of my understanding of @shaunlebron 's rule 3. Later stages means that the implementation is essentially complete except for the configuration for enabling/disabling rule 3. Documentation for this is also not yet complete, though given the simplicity of the approach I imagine it to be fairly straightforward. That said, I am also in the middle of additional major rework to zprint based on some of the conversations here and in Clojureverse last winter as well as a number of requests for enhancements to zprint, and so it will probably be several more months before any of this is released.

PEZ commented 4 years ago

Great to hear @kkinnear! Does any of the rework mean that I can get help with keeping track of the current cursor position after a reformat? I'm asking because I am right now struggling with a bug in my hacky code for it, and I do not seem to be able to figure the bug out...

kkinnear commented 4 years ago

We talked before about how we might communicate about cursor position. Why don't we continue that conversation off of this list. My email address is listed at the kkinnear github repo, the one for zprint. If you send me email, we can work out what would be best for your use case. Thanks!

camstuart commented 4 years ago

As a newcomer to Clojure, and a Go programmer, my opinion is "pick a style, and get used to it" just like us newcomers are told with parens.

Keep it:

Simple (no complex rules)
Accessible (available to all tools, thanks to point 1)
Consistent (everybody uses it, because points 1 and 2)
Built in (in an ideal world, makes point 2 easier)

I too agree with what has been said in this article: "Better Clojure formatting"

But if being built in is an issue to JVM warm up time, then perhaps this could be a function available from the repl??

The point I'm attempting to make is that like parens, a consistently used formatting style is something you can get used to very quickly.

If one is not adopted by the community, then every piece of syntax that has a different formatting style will make readability suffer, and motivate us to write GitHub issues and blogs about it.

Instead of syntax formatting fading into the background along with parens where it should be.

vemv commented 4 years ago

FWIW, at work we've had https://clojars.org/formatting-stack in the oven for 10 months now. People using Emacs, Cursive and Fireplace alike are using it.

Will be properly open-sourced (aside from its current clojaring) soon. Personally I care a lot about not contributing even more noise to the scene by sharing solutions that aren't 100% complete.

then perhaps this could be a function available from the repl??

This is how formating-stack works :) among other ways of invocation.

ro6 commented 4 years ago

TL;DR

Beginners benefit more from a ubiquitous standard than experts are affected either way. There are more of them, and they are entering the Clojure community from a wider range of backgrounds over time. @tonsky's criteria fit the expectations of a formatter from the wider programming world, as well as the stated goals of this project, and have a better chance of reaching ubiquity.

User Experience

As a beginner, having a decision-free, community-preferred code format means one less set of choices when starting out. The closer to 100% ubiquity the better, since:

I don't feel compelled to change my editor or environment to start learning Clojure (my familiar tools are supported as well as any others).
Code I encounter is more likely to be formatted to the standard, regardless of their origin or where they are posted.

As an expert:

I'm annoyed by a weak formatting tool, because my preferred style is better in ways X, Y, and Z, and I'm used to powerful, customizable tools that understand my code.
I enjoy having the option to save time by using the standard instead of thinking or arguing about it.
I'm on a team of Lispers, so we can still fight it out over an internal standard just like the status quo (as apparently @vemv and co have done with formatting-stack) OR I'm the senior Lisper on my team, so I can dictate how we work
I can still do whatever I want because my editor's written in Lisp and my REPL is running...

Ubiquity

When it comes to standardization, weakness is strength.

Why do we pick JSON over XML? It is weak and ubiquitous.

Why is C thriving after 35 years of C++? It is weak and ubiquitous.

Powerful, complex standards don't survive without strong authority (eg the C++ Standards Committee), and they don't get ported without financial backing. We aren't going to have those in Lisp land, nor do we want them.

Context

Clojure has quietly become a cross-platform standard for programming (see @stuartsierra's blog post, Arcadia, and even Clojerl). I love this since it multiplies my reach, and sometimes I even get some code reuse. It also attracts developers from the various communities we reach. We need to remember that the REPL and tooling the way we use it doesn't even exist as a concept in many communities. Even when well-explained, it's hard to really understand how good it is until you use it for a while (the Blub Paradox).

There are several mental chasms to cross with Clojure, most of them with an "aha" moment at the end, justifying the effort. Syntax formatting isn't one of them. We don't do it categorically better than other communities, so it isn't a point of departure worth the risk of losing a beginner. Once a person reaches the point of understanding data+REPL-driven development, Paredit, and other significant aspects of the Clojure ecosystem, they can make their own decisions about advanced syntax tooling.

A weak, ubiquitous format can be a baseline for the community, but doesn't need to prevent any individual or group from making a different decision for their code. Having a common baseline that truly runs everywhere Clojure code lives would be strictly better than the status quo.

vemv commented 4 years ago

I'd indicate that the 'beginner' persona is often overloaded. It could mean a college student, a junior engineer, a mid-to-senior-level engineer new to a given language... And even within each of those categories, you will find people who care about formatting in very varying degrees.

Who can speak for all of them?

Having a common baseline that truly runs everywhere Clojure code lives would be strictly better than the status quo.

Seems far from objective. It seems more likely to me that we'd end up with a xkcd 927 sort of scenario. Is a standard for beginners and people who don't feel compelled to change their editor really a standard?

Personally I'd want to believe that one can "start small" by creating a configurable formatter that can appeal to a wide variety of Clojurists, for the simple reason that it already conforms to their preferences. Later, finding the "one true indentation" would be simply a matter of agreeing in a baseline .edn file, then fostering it.

Contrariwise, building a bespoke formatter around a specific notion of one-true-formatting seems to imply a far riskier path for adoption. Basically it conflates the desired rules with an specific implementation.

This smart-small approach is already a reality with the mentioned formatting-stack project, which me and my colleague @thumbnail keep refining. It's really done, but prior to a public release we keep emphasizing agnosticism (e.g. Eastwood or Kondo for linting? Why not allowing choice? Or even a hybrid, where each part can fill gaps in the other?).

Btw I have it in the hammock to "donate" this project to clj-commons after release, further emphasising that it's a tool for everyone to enjoy/hack, and not a vehicle for delivering any particular individual's opinions. Let's see if you like it first :)

frou commented 3 years ago

I am in the later stages of enhancing zprint to support (as a high level ":style") what I perceive to be @tonsky 's style with the inclusion of my understanding of @shaunlebron 's rule 3.

@kkinnear Sorry for reviving an old thread. But, just for the benefit of google-searchers, is that what ended up being {:style :indent-only} in zprint?

kkinnear commented 3 years ago

Yes, actually. While zprint originally ignored all white space not at the top level of a file, now there are four "styles" that zprint supports -- 3 of which take varying degreess of notice of the existing white space. You can see lots of examples using the links below, or in the totally revamped zprint documentation. Briefly, the four styles are as follows:

{:style :indent-only} which is (as described above) what I perceive to be @tonsky 's style with my understanding of @shaunlebron 's rule 3. It doesn't change lines, but does properly indent every line, and recognizes where the third thing in a list sits with respect to the second thing in a list when making indentation decisions, so that this:
```
(this is
  a
        test)
```
will become:
```
(this is
  a
  test)
```
instead of:
```
(this is
a
test)
```
{:style :respect-nl} which "respects new lines" -- every line break is kept, and any additional ones necessary to try to keep within the :width are added.
{:style :respect-bl} which "respects blank lines" -- blank lines everywhere are kept, and otherwise formatting proceeds as it normally does.
Classic zprint -- where, except for blank lines at the top level of a file, all white space is ignored. Well, that's not quite accurate -- inline comments are aligned based on whether they were aligned in the input.

Anyway, that's more than you probably wanted to know...

bbatsov commented 3 years ago

With several years of delay I'll post here the case for Semantic Clojure Formatting as well.

TLDR;

The semantic indentation rules are not complex (and infering the right semantic indentation by tools is not exactly rocket science either)
Optimizing for tooling over optimizing for humans is not the right approach for me

As for the beginner persona - in both cases (semantic or fixed formating) we're literally talking about two trivial rules. I can't imagine how discarding the different semantics makes the situation for beginners simpler. Given the lack of real progress on the unification front in the past couple of years I'm even more convinced this is not worth it, and we should just agree that our formatters need to be configurable and everyone should do whatever they want (and aim for consistency in their own projects).