Closed lucaswerkmeister closed 10 years ago
// before a function/method
fWriter.blankLines {
0->0,
1->1000
}; // means: I’d love to have one blank line above a method, but no blank lines are okay too
// after a function/method
fWriter.blankLines {
0->0,
1->500
}; // means: there should be a blank line after most methods
// after a body’s opening brace
value haveBlankLine = headerWasMultiLine then 2000 else -2000;
fWriter.blankLines {
0->-haveBlankLine,
1->haveBlankLine
}; // means: after the brace, no blank line (-2000 overrides method’s 1000),
// except if the header was multi-line (e. g. long argument lists, type restrictions)
// before a body’s closing brace
fWriter.blankLines {
0->10000
1->-10000
}; // means: never put a blank line before the closing brace. -10000 overrides 1000.
The formattingWriter
sums up the blank lines, then picks the one with the maximum value.
What I noticed while writing down the previous suggestion: It only uses 0
and 1
as keys. Negative values obviously don’t make sense, but is there any context where you’d want more than one consecutive blank line in code? If no, that model is probably too complicated. (I still like the idea of using the numbers as priorities – just like the spaceBefore
/spaceAfter
model – but the values seem unnecessarily complex.)
I went looking for consecutive blank lines in the Ceylon source code:
find ceylon.language/src/ ceylon-sdk/source/ -type f -name '*.ceylon' -exec \
awk -v RS='\n[[:space:]]*\n[[:space:]]*\n' '{ N++ } END { if (N>1) print FILENAME }' {} \;
24 matching files (all in the SDK), and all the double blank lines fall into one of these two categories:
It’s tempting to declare them “invalid – won’t support”, but I think the first category is legitimate.
(Resource.ceylon has a triple blank line, but I really don’t think we have to support that.)
A few observations:
double blank lines only occur between top-level declarations (top-level = no indentation)
Let’s verify that claim:
find ceylon.language/src/ ceylon-sdk/source/ -type f -name '*.ceylon' -exec \
awk -v RS='\n[[:space:]]+\n[[:space:]]+\n' '{ N++ } END { if (N>1) print FILENAME }' {} \;
Only three matches:
I think we can safely live with that restriction.
Can we say that top-level declarations always have two (or no – e. g. imports) blank lines in between?
Eh... See for example annotations.ceylon – many toplevel declarations, all separated by single blank lines.
But we can probably live with that model for now (maybe make the double blank lines an option), and possibly refine it later.
Explicit “no blank line here” seems to always beat explicit “blank line here”
Still not sure about this one, I’ll try to find a counterexample.
And by the way, if anyone else is reading this, by all means feel free to pitch in (this is tagged “request for comments” after all)!
We currently discard the user’s code, and thus turn this:
value s = "Hello";
value l = s.size;
value i = 3; // independent from code above, thus separated by blank line
into this:
value s = "Hello";
value l = s.size;
value i = 3; // separation gone
(minus the comment change, of course). In other words, we discard any blank lines between instructions, which might have been there for very good reasons (for instance, separating sections of the code). I thought that we could ignore this – frankly, because I was too lazy to think about it, and also because it kinda breaks #17 – but I’ve come to the conclusion that we can’t.
But with that in mind, it seems to me that a solution to the problem at hand is clear: don’t discard the user’s blank lines – instead, canonize them.
The border between any two tokens has a range of allowed amounts of line breaks (0..0
: line break forbidden – e. g. inside a <GroupedType>
; 0..1
: optional line break, but no blank line – e. g. after a ,
in an argument list; 1..2
: forced line break, blank line allowed – e. g. before a closing }
). Each token can set such a range for the border before and after it; the effective range of a border is the intersection of the two ranges. (An empty intersection is probably a bug, but I’ll have to look into that.) If the resulting range has more than one element, the number of newlines in the existing code is clamped into the range; if there is no existing code (see #17), the lower border is used (or maybe that could be an option as well). For various types of tokens, the ranges could be adjusted with options.
Options:
blankLinesAfterImports=1..2
blankLinesBetweenStatements=0..1
blankLinesBeforeClosingBrace=0..0
Excerpt from formatting code:
fWriter.writeToken {
"}";
beforeToken=blanks2newlines(options.blankLinesBeforeClosingBrace);
afterToken=1:2;
context=context;
};
// ...
"Helper function"
Range<Integer> blanks2newlines(Range<Integer> blanks) {
return (blanks.first+1)..(blanks.last+1);
}
Input:
void run() {
print("Hello!");
print("Olá!");
print("你好");
}
Output:
void run() {
print("Hello!");
print("Olá!");
print("你好");
}
What happens at the end? The value 2
(newlines = 1 blank line) is clamped into the intersection of 1..2
(blanks2newlines(options.blankLinesBetweenStatements)
) and 1..1
(blanks2newlines(options.blankLinesBeforeClosingBrace)
), resulting in 1
newline.
Note that the assumption
Explicit “no blank line here” seems to always beat explicit “blank line here”
enters this model via the fact that we use the intersection, not the union, of the two ranges.
All in all, I think I really like this model. We’ll see what I think of it when I have to implement it :D
Okay, I went with the new model. I’ve found even more things about it that I like, and all in all, it was easier to implement than I had feared.
Copy+paste of the commit message (Markdownified), since it’s relevant and nicely concludes this issue:
Until now, you could only call nextLine()
, which would simply force a new line to be written. This was ridiculously unsophisticated and led to ugly corner cases (there should be a blank line between two methods, but not necessarily between a method and the class’s opening or closing brace).
In the new model, any token can declare how many line breaks it wants to allow before and after it. For example, a statement-ending ;
allows 0..2
line breaks behind it (0
for comments, 2
for blank lines), a body’s closing brace allows 0..1
line breaks before it (0
for single-command
blocks, 1
is the “normal” case). For the border between two tokens, the intersection of these ranges is made, and then the amount of existing line breaks in the token stream is clamped into that range and used.
The fact that we reuse the existing line breaks and only “sanitize” them is really cool, because it means that the following is (with appropriate options) already well-formatted:
void run() {
print("Hello");
// do something different
value v = process.arguments.first;
if (exists v) { print(v); }
print("Bye");
}
Note the blank lines that separate sections of the code, as well as the one-line block. Both were not possible previously.
It also means that we use the user’s line wrapping if it’s already okay:
fun(callChain(longArguments(north, something1), arg),
callChain(longArguments(east, something2), arg),
callChain(longArguments(south, something3), arg),
callChain(longArguments(west, something4), arg));
(I wanted to write that Eclipse would move the callChain
token up because it still fits into the previous line, as I remember that having been really annoying – but Eclipse no longer seems to do that. The formatter must’ve gotten smarter sometime.)
This might make #11 obsolete.
What’s still missing is heaps of options for these line break ranges, and documentation. I’ll add these in separate commits.
Many of the test cases had to be updated because the unformatted code was too ugly, and some of the line breaks would be used in the formatted output.
So just one day after I wrote that “Architecture is mostly finished now” (af0b4de), I stumbled over a part that’s still really hacky :D
Currently, the various
visitX
just callfWriter.nextLine()
at their leasure, which always forces a newline. This leads to ugly newline distributions like in thesimpleClass
test.I could probably shuffle the
nextLine
s around and introduce some flags for corner cases (no blank line before method if it’s at the beginning of the compilation unit, blank line at class beginning iff parameter list was multiline, etc.) so that it almost always looks right, but that seems a little ugly to me, and I really think that newline handling belongs in theFormattingWriter
as well, so how should the interface look like?