Closed drizk1 closed 2 months ago
Ooh I will take a look shortly.
Will share feedback by tomorrow. Thanks for continued work on it.
all good! no rush! I'm just enjoying my free time
edit, was just resolving the conflicts
Given how swamped I've been, I would suggest moving forward with all this.
My main question: is sql_agg()
doing something different than sql()
in dbplyr? If not, should we just call it sql()
?
sql_agg
and sql
are different.
sql
in dbplyr is used to directly inject a complete string in any function vs sql_agg
only works with @mutate
to allow any aggregate function to be used in mutate, not just those in the parsers. So it still interplays with the underlying parsers to correctly add windowing allowing it to work.
I havent added sql
yet because I'm how to make it play nicely with the metadata (something that is not an issue here)
so the difference would be in dbplyr it would be
mtcars |>
group_by(cyl) |>
mutate(sql("regr_intercept(mpg, hp) OVER (PARTITION BY cyl ) AS test")
vs here
DB.@chain DB.db_table(con, :mtcars) begin
DB.@group_by cyl
DB.@mutate test = sql_agg("regr_intercept(mpg, hp)")
Got it. In that case, suggest moving forward!
Does sql_agg
insert raw strings in a way that can be used in other functions like summarize or filter? Or only mutate?
It throws an error if it's used with @summarize.
someone could use it elsewhere to inject a string elsewhere.
I could change this behavior and make it error if used elsewhere as well
So i think for now that I will just leave sql_agg
stashed away in the parsers while i refine it and make it better, and will put it in the readme/news at a later. so I guess it would be an easter egg for anyone who has seen the commits/explore the code
Another potential name for it would be sql_raw()
. I think it's fine to leave it as an internal function as long as it's not intended for users to use it directly. But if you do export it, document it and just state its limitations.
sql_raw
could work as well as a name. Its not as explicit, but I would be cool with that too.
For now I have commented it out of parsers, and will bring it back online when I add stricter erroring for it to force it to only be used in @mutate
On an unrelated note, I have added Athena joining support (tested successfully with self joins for now) and will merge and release shortly. It adds a slight syntax difference only for Athena Joins where there is an additional argument with the parameters (i am brainstorming ways to fix this but I wanted people who might need it to be able to use it now so it is covered in the docs)
below pulls from the docs covering the joins in athena
@chain from_query(mtcars) begin
@full_join(demodb.mtcars, cyl, cyl, athena_params)
@group_by(efficiency)
@summarize(avg_hp = mean(hp))
@collect
end
This PR
addssql_agg
connect()
to remove need to load external packages for setting up non duckdb connections and auto sets the sql mode #14from_query
to allow reusing query portion/cte multiple timesBefore merging i wanted to run
sql_agg
by you, @kdpsingh , especially the name. By a stroke of luck when building Tidierdb,@summarize
can handle any sql aggregate function as long as the sql syntax is correctly written regardless of which aggregate functions are built into the parsers, which makes it pretty flexible. However, due to SQL behavior with window functions,@mutate
can do this for generic sql functions but not for aggregate functions .sql_agg
is is similar to dbplyrssql()
, which allows user to directly put a sql string into a query.. However, the main difference here is thatsql_agg
is only used in@mutate
to give users flexibility to use any aggregate function (outside of those in the parsers) from any dialect in their query, while the available sql parsers expand. (duckdb and clickhouse alone have many agg functions for example)So the two questions for
sql_agg
are: is there a preferable name tosql_agg
, and should something likesql_agg
be includedbrief example
connection examples (
connect
auto selects the parser as well)