PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.98k stars 218 forks source link

Don't format s-strings #1284

Open max-sixty opened 1 year ago

max-sixty commented 1 year ago

From #965, broadening that issue to being able to avoid the auto-formatting. For context, this is can be important with non-standard SQL, like Snowflake's semi-structured data.

A couple of options for how we could do this:

aljazerzen commented 1 year ago

Yeah, we should never format s-strings. I think the second in is the best option and we can fallback to using the third for the time being.

BlurrechDev commented 1 year ago

I'm going to look at this. I will explore a fix for this based on the second option above to see how feasible this is.

max-sixty commented 1 year ago

Great @BlurrechDev !

If that becomes unwieldy (i.e. we're passing a huge struct along), then we can reassess.

Feel free to post half-complete code — either for feedback or to merge something initial.

aljazerzen commented 1 year ago

Great. If you need some pointers, this is how I'd do it:

max-sixty commented 1 year ago

This issue prevents folks from using the escape hatch, so bumping this to "Priority"...

aljazerzen commented 1 year ago

I've did a little work on this, but it's much harder than I anticipated.

My idea was to replace s-strings with some unique identifier before compiling to SQL. After SQL is formatted, we can replace the identifier back with s-strings.

Starting PRQL:

from my_table
select s"COUNT ( DISTINCT {my_col})"

AST:

 From: my_table
 Select:
   SString:
   - "COUNT ( DISTINCT "
   - my_col
   - ")"

SStrings extracted:

From: my_table
Select:
  SString:
  - '_anchor_1'
  - my_col
  - '_anchor_2'

Compiled to SQL:

SELECT '_anchor_1'my_col'_anchor_2' FROM my_table

Formatted:

SELECT
  '_anchor_1' my_col '_anchor_2'
FROM
  my_table

Inject SStrings back in:

SELECT
  COUNT ( DISTINCT  my_col )
FROM
  my_table

... which is pretty close to what we'd want. The spacing after COUNT and before DISTINCT was preserved, as intended. But because formatting adds spacing between anchors, there are spaces around my_col.

I'm not sure we want to merge this, as it feels like a workaround using hacky text manipulation.

max-sixty commented 1 year ago

Is there anything to having the whole S-string as a variable?

So the expression to be formatted is:

SELECT 
- '_anchor_1'my_col'_anchor_2' 
+ $_s_string_
FROM my_table

...and then we replace the variable after the formatting? So the S-string is completely opaque to the formatter.


a workaround using hacky text manipulation.

This used to be all of the compiler! 😀

aljazerzen commented 1 year ago

That's a good idea, but a bit problematic because you have to translate my_col somehow. If can could do this separately, then that's the way to go.

max-sixty commented 1 year ago

I was thinking of starting on this. But is it now intractable — everything is an s-string since the stdlib changes?

Or could we format the expressions that go into the s-strings separately? That seems quite difficult, if I'm thinking about it correctly.

aljazerzen commented 1 year ago

No, not really. The s-strings in std.sql.prql have a completely separate codepath and never land in the AST as s-strings.

So it still as tractable as it was before.

max-sixty commented 1 year ago

Ah great! I hadn't realized that

aljazerzen commented 1 year ago

I did remember you just asking about this: https://github.com/PRQL/prql/issues/2694#issuecomment-1575693384

:D

max-sixty commented 1 year ago

Sorry, that's bad memory even by my standards!

philpep commented 1 month ago

Another use case for this with postgres range operator:

from foo
select x=s"range @> time"

Produce:

SELECT
  range @ > time AS x
FROM
  foo

Which is invalid due to extra space added in @ >.

Any workaround would be appreciated, thanks!

aljazerzen commented 1 month ago

The workaround is to disable formatting. In the CLI, there is --no-format, in Rust API there is Options::format, but in the Playground this option is not accessible.