clojure-emacs / cider

The Clojure Interactive Development Environment that Rocks for Emacs
https://cider.mx
GNU General Public License v3.0
3.52k stars 643 forks source link

`cider-docstring--format` breaks some formatting #3709

Closed katomuso closed 3 weeks ago

katomuso commented 4 weeks ago

Currently, the cider-docstring--format function used for formatting docstrings breaks some formatting by moving every single paragraph (indicated by .) to a new line and removing a variable number of spaces from the beginning of each line. Let me demonstrate how it looks.

Snippet to get original docstring:

(thread-first
  (cider-var-info "clojure.core/reduce")
  (nrepl-dict-get "doc"))

Original docstring:

f should be a function of 2 arguments. If val is not supplied,
  returns the result of applying f to the first 2 items in coll, then
  applying f to that result and the 3rd item, etc. If coll contains no
  items, f must accept no arguments as well, and reduce returns the
  result of calling f with no arguments.  If coll has only 1 item, it
  is returned and f is not called.  If val is supplied, returns the
  result of applying f to val and the first item in coll, then
  applying f to that result and the 2nd item, etc. If coll contains no
  items, returns val and f is not called.

Snippet to get formatted docstring:

(thread-first
  (cider-var-info "clojure.core/reduce")
  (nrepl-dict-get "doc")
  cider-docstring--format)

Formatted docstring:

f should be a function of 2 arguments. If val is not supplied,
returns the result of applying f to the first 2 items in coll, then
applying f to that result and the 3rd item, etc. If coll contains no
items, f must accept no arguments as well, and reduce returns the
result of calling f with no arguments.

If coll has only 1 item, it
is returned and f is not called.

If val is supplied, returns the
result of applying f to val and the first item in coll, then
applying f to that result and the 2nd item, etc. If coll contains no
items, returns val and f is not called.

It seems that nobody has noticed this strange formatting because cider-docstring--format is currently used only for docstrings in the minibuffer and in completions and not many people look at the docstrings there (though it will also be beneficial to use it for docstrings in documentation buffers).

As discussed in #3701, removing two spaces from the beginning while formatting the docstring will be good enough.

katomuso commented 4 weeks ago

I should've mentioned that this formatting is done specifically before displaying the docstring to the user. We remove two spaces from the beginning because this is a literal docstring from the source code. Since there are two spaces at the beginning of lines in docstrings, we remove them to make the docstring visually align nicely.

yuhan0 commented 4 weeks ago

Agreed that this is a much too idiosyncratic and 'surprising' formatting choice to be applying across the board to all docstrings, especially if (as the inline comment suggests) it was intended for the specific case of clojure.core/reduce.

vemv commented 4 weeks ago

. is used beyond reduce - it's relatively frequent across clojure.core.

I find the original reduce docstring 100% unreadable - paragraphs give readers breathing space. It's part of the intended semantics of . which is a different construct from ..

(You used paragraphs in your OP as well 😄)

Anyway, the reformatting is primarily intended to be used within a specific context, namely trimming docstrings semantically up to N lines. Splitting by . allows to split in a more fine-grained manner while also not trimming in a random (non 'semantic') place.

Nobody else has complained about this since its introduction, so please let's default to doing nothing until real problems pop up, if any.

Thanks!

bbatsov commented 4 weeks ago

@vemv I think here the question is whether those spaces are intentional or accidental, though. And to me they definitely seem accidental, so treating them specially seems like a mistake on our part.

vemv commented 4 weeks ago

I'm pretty sure they're intentional, although sometimes inconsistent.

'Two spaces after a period' is definitely a thing in English.

I thought of asking over https://ask.clojure.org/ if they could use . more frequently so that tools have the chance to detect paragraphs in clojure.core.

(Especially given that larger docstring changes there seem highly unlikely)

katomuso commented 4 weeks ago

@vemv It's okay if you want to leave splitting paragraphs be, but currently it is done incorrectly (note how the "If coll has only 1 item, it is returned and f is not called." paragraph is broken into two lines). And it's also worthwhile to replace not all spaces from the beginning, but exactly two because there can be some additional indentation, and it breaks it.

bbatsov commented 4 weeks ago

'Two spaces after a period' is definitely a thing in English.

But it's super controversial and rather uncommon these days. If I were to wager a guess, it'd be that some of them are to do with early usage of Emacs for Clojure development (by Rich), as Emacs was the last tool that was pushing for 2 spaces by default. Semantically there's no difference between single space and double space between sentences.

If someone can actually have a real paragraph in the docstring why rely on something obscure instead? I'm reasonably sure those are just accidental.

katomuso commented 4 weeks ago

I agree with @vemv that two spaces after a paragraph's dot are intentional and a common idiom in comments, but I think this is too opinionated to make every single paragraph on a separate line.

bbatsov commented 4 weeks ago

I agree with @vemv that two spaces after a paragraph's dot are intentional and a common idiom in comments, but I think this is too opinionated to make every single paragraph on a separate line.

I'm a bit puzzled by this - if it's just our assumption those mark paragraphs, how do we know it's a common idiom to denote them? I've spent quite a bit of time exploring the elements of style in both English and programming and that's not something I've ever come across.

katomuso commented 4 weeks ago

Well, I mean it's common in the Clojure standard library, and periodically I find them in Emacs as well. But yeah, outside of the standard library, they are very rare, so we should not rely on that assumption.

bbatsov commented 4 weeks ago

But in Emacs by default it's 2 spaces all the time (Emacs even has docstring lint checks) and paragraphs are just paragraphs, that's why I don't understand why you think they have some special meaning. Perhaps some examples will help me to understand what you mean?

yuhan0 commented 4 weeks ago

Might also be worth pointing out that periods can be used for other purposes like ASCII diagrams, which would be completely broken by such a formatting choice.

A somewhat contrived example:

(defn translate-morse
  "Convert the dots-and-dashes representation of Morse code into a string:
  .-   => A   .    => E    (etc.)
  -... => B   .--. => F
  -.-. => C   --.  => G
  -..  => D   .... => H"
  [input]
  ,,,)

More broadly it seems like an overreach for CIDER to make any assumptions on behalf of the author of the docstring and their intended layout, beyond the bare minimum (e.g. trimming the 2-spaces before each line, converting backticked symbols into links etc.)

katomuso commented 4 weeks ago

Can we conclude that for now it will be okay to just trim 2 spaces before each line for docstrings which are intended to be displayed?

yuhan0 commented 3 weeks ago

I think this issue should remain open, the part about ". " formatting hasn't been resolved.

For clarity's sake, here's the current implementation applied to the above translate-morse docstring:

(with-current-buffer "repro.clj"
  (thread-first
    (cider-var-info "repro/translate-morse")
    (nrepl-dict-get "doc")
    cider-docstring--format
    princ))

=>

Convert the dots-and-dashes representation of Morse code into a string:
.-   => A   .

=> E    (etc.)
-... => B   .--. => F
-.-. => C   --.

=> G
-..

=> D   .... => H
katomuso commented 3 weeks ago

@yuhan0 Hmm, are you sure you are using the latest version of cider-docstring--format? I guess that's the case because current cider-docstring--format is as simple as (replace-regexp-in-string "\n " "\n" string).

yuhan0 commented 3 weeks ago

Oops you're right, had things mixed up with git worktrees and stale elc's. All resolved on the latest version :)