balanceCommentsList makes a mess of relative positions

take an input like

mySignature :: Bool -- comment 1
-- comment 2

myList = [ 1, 2, 3 ]

Initially, the two comments are connected to myList, and while myList itself has no direct information about its relative position to the last element (comment 2), it can easily be derived from deltas over anchor positions.

However, once you apply balanceCommentsList, comment 1 and comment 2 get moved to somewhere inside the mySignature tree. I.e. we are in a situation where:

1) mySignature has a sourceSpan that covers just one line, excluding the comments. 2) myList has a sourceSpan that covers just one line. It does not have comments attached. It does not have a information about the preceding annotated element inside. 3) The comments still exist, but somewhere inside the tree.

The good news is that now inserting a new decl after mySignature has a chance to work without being inserted before "comment 1". The bad news is that retaining the empty line before myList seems really cumbersome, because it needs to scan the preceding element for comments, take the last one, compute a DP from that etc? I dunno.

We can modify the scenario by adding a comment:

mySignature :: Bool -- comment 1
-- comment 2

-- comment 3
myList = [ 1, 2, 3 ]

which changes things lightly: After balanceCommentsList "comment 3" is still associated to myList and it keeps a reference to the preceding "comment 2". But then if you used this as a tool for refactoring and wanted to insert a decl in between mySignature and myList this seems to become a mess too.

A few thoughts:

1) Shouldn't decls themselves carry a DP or a reference to the preceding element? Why only add this information to comments? 2) It is really annoying having to compute DPs as a separate step. Things like "there are two blank lines between decls 2 and 3" should be encoded in some way more obvious way. 3) One option would be to encode any additional whitespace as a comment. Comments make the code more readable, but don't change the program's semantics. So does whitespace. It is the same thing really. I.e.

~~~~.hs
foo = 42

bar = 43
~~~~

could be encoded as `[foo-node with an after-comment of "one empty line", bar-node]`. Or the other way around if you prefer. Even making comments first-class seems to make sense, i.e. `[foo-decl, whitespace-comment, bar-decl]` as the elements of the module.. the shape of the AST changes if you drop the annotations, but the current approach basically is stage-dependent-shape anyway, so..

4) As long as srcSpans are relevant for deriving DPs, moving a comment inside a node should enlarge the node's srcSpan, just for consistency. Either a comment is "at the same level" and excluded in the srcSpan or it is "below" somewhere and included. This means that moving comments/balancing becomes even messier, but that's a consequence of the general approach of using srcSpans do encode how much whitespace exists. 5) Pleaaaaaase make it possible and not a complete hassle to exactprint individual decls.

alanz / ghc-exactprint

balanceCommentsList makes a mess of relative positions #119