golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.92k stars 17.52k forks source link

x/tools/gopls: feature: repair the syntax #69171

Open adonovan opened 2 weeks ago

adonovan commented 2 weeks ago

When editing, it is common for the code to be ill-formed, because for example you have opened a block with { but not yet closed it, or pasted some code into the middle of a function and not yet integrated it. A parser with good error recovery (i.e. better than go/parser, at least for now) can often produce an AST with minimal lossage, implicitly inserting the missing close braces and suchlike. In principle, the difference between the pretty-printed repaired AST and the actual source could be a offered as a completion candidate or a quick fix, "repair the syntax".

Even with our not-very-fault-tolerant parser, I suspect we could do a good job with modest effort by a pre-scan of the input file from both ends that matches each paren with its partner, exploiting indentation (column numbers). This would quickly localize the region of damage to a particular function, or even a block, statement, or expression within it. We would then parse and pretty-print just the errant subtree, causing missing parens (etc) to be inserted; Bad{Expr,Stmt,Decl}s would be replaced by /* ... */ comments. The result would be a well-formed tree that would allow the user to save, gofmt, and perhaps build and run tests. It may also re-enable use of cross-references and other features that are crippled in the vicinity of the syntax error. (And given the parser's current lack of fault tolerance, the "vicinity" may be more accurately described as a "blast radius".)

(The inspiration was a conversation about LLMs with @josharian.)

gabyhelp commented 2 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

findleyr commented 2 weeks ago

I think this is an interesting idea, and worth exploring.

I'll just note that for better or worse, I almost always operate on mostly coherent syntax, due to the combination of auto-pairing delimiters and using my editor to select and operate on surrounding blocks. As a result, the type of extreme corruption you describe isn't (usually) a problem, and for me at least it would probably suffice to add some very-obviously-correct synchronization to go/parser (such as always synchronizing to func foo, which is amazingly something we don't do). Maybe I've just been trained to do this by our poor parser recovery.

I think we should fix parser recovery first, at least making the obvious improvements, and then reevaluate.

josharian commented 2 weeks ago

This would quickly localize the region of damage to a particular function, or even a block, statement, or expression within it.

Unsurprisingly, I'd really love this feature. I'm apparently not as organized as @findleyr and I almost always operate in a mode in which gopls works badly for me because I have left half broken code lying around. 😅

I suspect we could do a good job with modest effort by a pre-scan of the input file from both ends that matches each paren with its partner, exploiting indentation

It's also worth mentioning that treesitter is a different approach to making a resilient parser, one with lots of miles on it and some good research papers describing it. It's unclear to me whether it would be possible to marry the ideas behind treesitter and go/parser, though.

robpike commented 2 weeks ago

PL/C did this back in the 1970s and it wasn't very successful in my experience, but with gofmt in effect telling you the block structure through redundant indentation, it might be worth another try.