TidierOrg / TidierDB.jl

Tidier database analysis in Julia, modeled after the dbplyr R package.
MIT License
35 stars 3 forks source link

Generate SQL "A IN (v1, v2, ..., vn)" #17

Closed vituri closed 1 month ago

vituri commented 2 months ago

With dbplyr it is possible to write something like

...
filter(Column %in% v) %>% ...

which is translated to

A IN (v1, v2, ..., vn)

Is this currently implemented in TidierDB? If not, how can I help with it?

drizk1 commented 2 months ago

Right now we support the below pulled from the readme

DB.@filter(mpg_efficiency in ("moderate", "efficient"))
WHERE mpg_efficiency in ('moderate', 'efficient'))  

Is this what you are looking for?

vituri commented 2 months ago

Can I pass a vector instead of writing each element one by one?

drizk1 commented 2 months ago

So the ideal answer is yes.. but I just tried with the preliminary interpolation I have set up and there is an error which I show below where the whole vector is put in quotes rather than each individual item in quotes as it needs to be. So we will need to figure out this error (likely with parse_interpolation2) before you can pass the vector (this may not be an issue if your vector is numbers not strings but i did not test that)

add_interp_parameter!(:vector, ["moderate", "efficient"])
@chain db_table(db, :mtcars) begin
    @filter(!starts_with(model, "M"))
    @group_by(cyl)
    @summarize(mpg = mean(mpg))
    @mutate(mpg_squared = mpg^2, 
               mpg_rounded = round(mpg), 
               mpg_efficiency = case_when(
                                 mpg >= cyl^2 , "efficient",
                                 mpg < 15.2 , "inefficient",
                                 "moderate"))            
    @filter(mpg_efficiency in !!vector)
    @arrange(desc(mpg_rounded))
  # @show_query
    @collect
end
ERROR: Parser Error: syntax error at or near "'moderate, efficient'"
LINE 1: ...T * FROM cte_3 WHERE mpg_efficiency in 'moderate, efficient')  SELECT * FROM c...
                                                  ^

Edit this has to do with how parse_interpolation2 and the filter macro interact but I think I should be able to fix it soon

drizk1 commented 2 months ago

alright, i have a fix (locally still), its not ideal but its the most straight forward way to fix interpolation so that the vectors interpolated into filter are not double quoted.

however, sql syntax for the in statement needs parenthesis surrounding whats follows in. To do that you just need to put the interpolated value in brackets.

So the syntax to interpolate a vector would look like

@filter(mpg_efficiency in [!!vector])
 WHERE mpg_efficiency in ('moderate', 'efficient'))  

I'll try to get this up and available soon

vituri commented 2 months ago

Looks good! I'll be glad to test it when you are ready.

drizk1 commented 2 months ago

Alright @vituri this branch "fix-filter-parse_interpolation-bug" (sorry for the long name) should allow you to interpolate a vector of strings into filter like so. if it does, ill merge and close this issue and expand the docs. Just remember to add brackets around the item you are interpolating.

add_interp_parameter!(:vector, ["moderate", "efficient"])
# rest of your code
@filter(mpg_efficiency in [!!vector])
 WHERE mpg_efficiency in ('moderate', 'efficient'))  
vituri commented 2 months ago

Looks good!


dias = map([1:10;]) do x
    today() - Day(x)
end .|> string

DB.add_interp_parameter!(:dias, dias)

@db_temp @chain DB.db_table(con, :Resumo_frota) begin
    DB.@filter(Dia in [!!dias])
    @aside DB.@show_query(_)
end

returned the following SQL:

WITH cte_1 AS (
SELECT *
        FROM Resumo_frota
        WHERE Dia in ('2024-05-07', '2024-05-06', '2024-05-05', '2024-05-04', '2024-05-03', '2024-05-02', '2024-05-01', '2024-04-30', '2024-04-29', '2024-04-28'))  
SELECT *
        FROM cte_1

which is exactly what I needed. Thanks a lot!

drizk1 commented 2 months ago

No problem! I am happy it works

I have noticed you use aside in a few different code chunks. Does that macro allow you to @show_query and @collect in the same chain?

kdpsingh commented 2 months ago

@aside lets you do some action having a "side effect" (like writing to a file or logging an intermediate result to a variable) and then passes along the input into the @aside macro back to the output for the next item in the chain.

It's a part of Chain.jl

vituri commented 2 months ago

Sorry to reopen the issue! I am seeing some strange behaviour with this approach.

I want to create a function that takes a connection con to my database and a vector dias of strings and filter some table.

import TidierDB as DB

function filter_my_data(con, dias)
    DB.add_interp_parameter!(:dias, dias)

    @show dias

    @chain DB.db_table(con, :Resumo_frota) begin
        DB.@filter(Dia in [!!dias])
        DB.@distinct(Dia)
        DB.@show_query(_)
    end
end

# create the connection
con = conexao_mariadb()

dias = ["2024-05-01"]
filter_my_data(con, dias)

The result is

julia> filter_my_data(con, dias)
dias = ["2024-05-01"]
WITH cte_1 AS (
SELECT *
        FROM Resumo_frota
        WHERE Dia in (missing))  
SELECT *
        FROM cte_1

that is: it did not understand the vector dias. If, however, I run the following lines outside of a function call:

DB.add_interp_parameter!(:dias, dias)

@chain DB.db_table(con, :Resumo_frota) begin
    DB.@filter(Dia in [!!dias])
    DB.@show_query(_)
end

then the result is as expected:

julia> @chain DB.db_table(con, :Resumo_frota) begin
           DB.@filter(Dia in [!!dias])
           DB.@show_query(_)
       end
WITH cte_1 AS (
SELECT *
        FROM Resumo_frota
        WHERE Dia in ('2024-05-01', '2024-05-02'))  
SELECT *
        FROM cte_1
drizk1 commented 2 months ago

So unfortunately, this relates the interpolation issue outlined here #22 and what you recently experienced #20. Because of interpolations design right now and scoping issues, interpolation wont work with a function. And a macro must be used with some limitations applying.

drizk1 commented 1 month ago

Since this works with the, albeit limited, current version of interpolation and there's a few other interpolation related issues open currently, I'm going to go ahead and close this for now. Ofc reopen as needed or if you feel this was closed incorrectly.