Open max-sixty opened 1 year ago
Yeah, we should never format s-strings. I think the second in is the best option and we can fallback to using the third for the time being.
I'm going to look at this. I will explore a fix for this based on the second option above to see how feasible this is.
Great @BlurrechDev !
If that becomes unwieldy (i.e. we're passing a huge struct along), then we can reassess.
Feel free to post half-complete code — either for feedback or to merge something initial.
Great. If you need some pointers, this is how I'd do it:
$s_string_1, $s_string_2, $s_string_3, ...
is ok.Vec<(String, String)>
(vec of pairs of generated and actual s-strings).This issue prevents folks from using the escape hatch, so bumping this to "Priority"...
I've did a little work on this, but it's much harder than I anticipated.
My idea was to replace s-strings with some unique identifier before compiling to SQL. After SQL is formatted, we can replace the identifier back with s-strings.
Starting PRQL:
from my_table
select s"COUNT ( DISTINCT {my_col})"
AST:
From: my_table
Select:
SString:
- "COUNT ( DISTINCT "
- my_col
- ")"
SStrings extracted:
From: my_table
Select:
SString:
- '_anchor_1'
- my_col
- '_anchor_2'
Compiled to SQL:
SELECT '_anchor_1'my_col'_anchor_2' FROM my_table
Formatted:
SELECT
'_anchor_1' my_col '_anchor_2'
FROM
my_table
Inject SStrings back in:
SELECT
COUNT ( DISTINCT my_col )
FROM
my_table
... which is pretty close to what we'd want. The spacing after COUNT
and before DISTINCT
was preserved, as intended. But because formatting adds spacing between anchors, there are spaces around my_col
.
I'm not sure we want to merge this, as it feels like a workaround using hacky text manipulation.
Is there anything to having the whole S-string as a variable?
So the expression to be formatted is:
SELECT
- '_anchor_1'my_col'_anchor_2'
+ $_s_string_
FROM my_table
...and then we replace the variable after the formatting? So the S-string is completely opaque to the formatter.
a workaround using hacky text manipulation.
This used to be all of the compiler! 😀
That's a good idea, but a bit problematic because you have to translate my_col
somehow. If can could do this separately, then that's the way to go.
I was thinking of starting on this. But is it now intractable — everything is an s-string since the stdlib changes?
Or could we format the expressions that go into the s-strings separately? That seems quite difficult, if I'm thinking about it correctly.
No, not really. The s-strings in std.sql.prql
have a completely separate codepath and never land in the AST as s-strings.
So it still as tractable as it was before.
Ah great! I hadn't realized that
I did remember you just asking about this: https://github.com/PRQL/prql/issues/2694#issuecomment-1575693384
:D
Sorry, that's bad memory even by my standards!
Another use case for this with postgres range operator:
from foo
select x=s"range @> time"
Produce:
SELECT
range @ > time AS x
FROM
foo
Which is invalid due to extra space added in @ >
.
Any workaround would be appreciated, thanks!
The workaround is to disable formatting. In the CLI, there is --no-format
, in Rust API there is Options::format
, but in the Playground this option is not accessible.
From #965, broadening that issue to being able to avoid the auto-formatting. For context, this is can be important with non-standard SQL, like Snowflake's semi-structured data.
A couple of options for how we could do this:
$s_string_n
. This requires passing s-strings all the way through the compiler, reducing the modularity of the compiler phases, but otherwise being fairly simple.Options
struct to disable formatting entirely. This has the disadvantage of being an all-or-nothing option, but might be an acceptable temporary solution.