eclipse-archived / ceylon.formatter

A formatter for the Ceylon programming language, written in Ceylon.
Apache License 2.0
14 stars 11 forks source link

Blank lines #24

Closed lucaswerkmeister closed 10 years ago

lucaswerkmeister commented 10 years ago

So just one day after I wrote that “Architecture is mostly finished now” (af0b4de), I stumbled over a part that’s still really hacky :D

Currently, the various visitX just call fWriter.nextLine() at their leasure, which always forces a newline. This leads to ugly newline distributions like in the simpleClass test.

I could probably shuffle the nextLines around and introduce some flags for corner cases (no blank line before method if it’s at the beginning of the compilation unit, blank line at class beginning iff parameter list was multiline, etc.) so that it almost always looks right, but that seems a little ugly to me, and I really think that newline handling belongs in the FormattingWriter as well, so how should the interface look like?

lucaswerkmeister commented 10 years ago

Suggestion

// before a function/method
fWriter.blankLines {
    0->0,
    1->1000
}; // means: I’d love to have one blank line above a method, but no blank lines are okay too

// after a function/method
fWriter.blankLines {
    0->0,
    1->500
}; // means: there should be a blank line after most methods

// after a body’s opening brace
value haveBlankLine = headerWasMultiLine then 2000 else -2000;
fWriter.blankLines {
    0->-haveBlankLine,
    1->haveBlankLine
}; // means: after the brace, no blank line (-2000 overrides method’s 1000),
   // except if the header was multi-line (e. g. long argument lists, type restrictions)

// before a body’s closing brace
fWriter.blankLines {
    0->10000
    1->-10000
}; // means: never put a blank line before the closing brace. -10000 overrides 1000.

The formattingWriter sums up the blank lines, then picks the one with the maximum value.

lucaswerkmeister commented 10 years ago

What I noticed while writing down the previous suggestion: It only uses 0 and 1 as keys. Negative values obviously don’t make sense, but is there any context where you’d want more than one consecutive blank line in code? If no, that model is probably too complicated. (I still like the idea of using the numbers as priorities – just like the spaceBefore/spaceAfter model – but the values seem unnecessarily complex.)

lucaswerkmeister commented 10 years ago

I went looking for consecutive blank lines in the Ceylon source code:

find ceylon.language/src/ ceylon-sdk/source/ -type f -name '*.ceylon' -exec \
    awk -v RS='\n[[:space:]]*\n[[:space:]]*\n' '{ N++ } END { if (N>1) print FILENAME }' {} \;

24 matching files (all in the SDK), and all the double blank lines fall into one of these two categories:

  1. between top-level declarations (after imports, between classes / methods)
  2. after/before top-level opening/closing braces (mostly classes)

It’s tempting to declare them “invalid – won’t support”, but I think the first category is legitimate.

(Resource.ceylon has a triple blank line, but I really don’t think we have to support that.)

lucaswerkmeister commented 10 years ago

A few observations:

lucaswerkmeister commented 10 years ago

double blank lines only occur between top-level declarations (top-level = no indentation)

Let’s verify that claim:

find ceylon.language/src/ ceylon-sdk/source/ -type f -name '*.ceylon' -exec \
    awk -v RS='\n[[:space:]]+\n[[:space:]]+\n' '{ N++ } END { if (N>1) print FILENAME }' {} \;

Only three matches:

I think we can safely live with that restriction.

Can we say that top-level declarations always have two (or no – e. g. imports) blank lines in between?

Eh... See for example annotations.ceylon – many toplevel declarations, all separated by single blank lines.

But we can probably live with that model for now (maybe make the double blank lines an option), and possibly refine it later.

Explicit “no blank line here” seems to always beat explicit “blank line here”

Still not sure about this one, I’ll try to find a counterexample.

And by the way, if anyone else is reading this, by all means feel free to pitch in (this is tagged “request for comments” after all)!

lucaswerkmeister commented 10 years ago

Completely different suggestion

Interlude: A vaguely related problem

We currently discard the user’s code, and thus turn this:

value s = "Hello";
value l = s.size;

value i = 3; // independent from code above, thus separated by blank line

into this:

value s = "Hello";
value l = s.size;
value i = 3; // separation gone

(minus the comment change, of course). In other words, we discard any blank lines between instructions, which might have been there for very good reasons (for instance, separating sections of the code). I thought that we could ignore this – frankly, because I was too lazy to think about it, and also because it kinda breaks #17 – but I’ve come to the conclusion that we can’t.

But with that in mind, it seems to me that a solution to the problem at hand is clear: don’t discard the user’s blank lines – instead, canonize them.

Suggestion

The border between any two tokens has a range of allowed amounts of line breaks (0..0: line break forbidden – e. g. inside a <GroupedType>; 0..1: optional line break, but no blank line – e. g. after a , in an argument list; 1..2: forced line break, blank line allowed – e. g. before a closing }). Each token can set such a range for the border before and after it; the effective range of a border is the intersection of the two ranges. (An empty intersection is probably a bug, but I’ll have to look into that.) If the resulting range has more than one element, the number of newlines in the existing code is clamped into the range; if there is no existing code (see #17), the lower border is used (or maybe that could be an option as well). For various types of tokens, the ranges could be adjusted with options.

Example

Options:

blankLinesAfterImports=1..2
blankLinesBetweenStatements=0..1
blankLinesBeforeClosingBrace=0..0

Excerpt from formatting code:

fWriter.writeToken {
    "}";
    beforeToken=blanks2newlines(options.blankLinesBeforeClosingBrace);
    afterToken=1:2;
    context=context;
};
// ...
"Helper function"
Range<Integer> blanks2newlines(Range<Integer> blanks) {
    return (blanks.first+1)..(blanks.last+1);
}

Input:

void run() {
    print("Hello!");

    print("Olá!");

    print("你好");

}

Output:

void run() {
    print("Hello!");

    print("Olá!");

    print("你好");
}

What happens at the end? The value 2 (newlines = 1 blank line) is clamped into the intersection of 1..2 (blanks2newlines(options.blankLinesBetweenStatements)) and 1..1 (blanks2newlines(options.blankLinesBeforeClosingBrace)), resulting in 1 newline.

lucaswerkmeister commented 10 years ago

Note that the assumption

Explicit “no blank line here” seems to always beat explicit “blank line here”

enters this model via the fact that we use the intersection, not the union, of the two ranges.

All in all, I think I really like this model. We’ll see what I think of it when I have to implement it :D

lucaswerkmeister commented 10 years ago

Okay, I went with the new model. I’ve found even more things about it that I like, and all in all, it was easier to implement than I had feared.

Copy+paste of the commit message (Markdownified), since it’s relevant and nicely concludes this issue:

Until now, you could only call nextLine(), which would simply force a new line to be written. This was ridiculously unsophisticated and led to ugly corner cases (there should be a blank line between two methods, but not necessarily between a method and the class’s opening or closing brace).

In the new model, any token can declare how many line breaks it wants to allow before and after it. For example, a statement-ending ; allows 0..2 line breaks behind it (0 for comments, 2 for blank lines), a body’s closing brace allows 0..1 line breaks before it (0 for single-command blocks, 1 is the “normal” case). For the border between two tokens, the intersection of these ranges is made, and then the amount of existing line breaks in the token stream is clamped into that range and used.

The fact that we reuse the existing line breaks and only “sanitize” them is really cool, because it means that the following is (with appropriate options) already well-formatted:

void run() {
    print("Hello");

    // do something different
    value v = process.arguments.first;
    if (exists v) { print(v); }

    print("Bye");
}

Note the blank lines that separate sections of the code, as well as the one-line block. Both were not possible previously.

It also means that we use the user’s line wrapping if it’s already okay:

fun(callChain(longArguments(north, something1), arg),
    callChain(longArguments(east, something2), arg),
    callChain(longArguments(south, something3), arg),
    callChain(longArguments(west, something4), arg));

(I wanted to write that Eclipse would move the callChain token up because it still fits into the previous line, as I remember that having been really annoying – but Eclipse no longer seems to do that. The formatter must’ve gotten smarter sometime.) This might make #11 obsolete.

What’s still missing is heaps of options for these line break ranges, and documentation. I’ll add these in separate commits.

Many of the test cases had to be updated because the unformatted code was too ugly, and some of the line breaks would be used in the formatted output.