PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.58k stars 208 forks source link

feat(fmt): An attempt at aesthetic items into PL #4639

Open max-sixty opened 1 week ago

max-sixty commented 1 week ago

By adding comments (named "aesthetics" here, and includes linewraps) to PL, this is an attempt to get around the complications of combining lexer + parser output in prqlc fmt, which #4397 has hit in a few incarnations.

This very very nearly works — with chumsky we can create a function that wraps anything that might have a trailing or following comment, implement a trait on the AST items that contain it — and away we go. (though it did require lots of debugging in the end...). The AST would then be really easy to write back out.

This requires comments to lead or follow tokens that are part of an AST item. I think there's literally a single case where it doesn't work, which is when a comment follows the final trailing comma of a tuple or array. So apart from that case, a comment always leads or follows a token that's part of an AST item.

...so tests fail at the moment, on that case.

Next we need to consider:

max-sixty commented 1 week ago

One thing we could do land from this is doc-comments — which must be attached to an item, and we may want to push through the AST. It's much less important to the project than getting prqlc fmt to work, but would let us merge something from this work rather than having a bunch of PRs & branches gradually accumulating merge conflicts...

aljazerzen commented 1 week ago

Oh, this is an interesting idea: instead of discarding all comments and new lines, we keep them in the first AST, so they can be re-incorporated into codegen.

It's a shame that it doesn't work for all the cases. It would be a beautiful solution for formatting comments. Could we parse trailing comma as an aesthetic too?


re doc comments: yes, they are very similar. I think they should be allowed only on statements, so that's even simpler to parse.

max-sixty commented 1 week ago

It's a shame that it doesn't work for all the cases. It would be a beautiful solution for formatting comments. Could we parse trailing comma as an aesthetic too?

Yeah, I added some more thoughts at https://github.com/PRQL/prql/pull/4397#issuecomment-2187146337. It's possible to do it this way, but not as elegant as I first thought — I think it would require lots of backtracking and custom parsers (i.e. not just delimited_by...) to be able to distinguish between the two cases in that comment...