TidierOrg / TidierDB.jl

Tidier database analysis in Julia, modeled after the dbplyr R package.
MIT License
43 stars 3 forks source link

How can I write functions that pass arguments do TidierDB functions? #20

Open vituri opened 4 months ago

vituri commented 4 months ago

Example: I want to write a function f that filter some data in my database. However, I don't want to fix the columns that can be filtered.

How can I create something like the following:

function f(con, conditions)
    @chain begin
        DB.db_table(con, :My_table)
        DB.@filter(conditions)
        .....................
        DB.@collect
    end
end

?

If I try something like f(con, Id == 1) I get the error UndefVarError:Idnot defined which arises from Julia trying to evaluate my expression. Is there some workaround?

drizk1 commented 4 months ago

To do this, you need to use bang bang interpolation and pass conditions you need as strings for filter. Column names could be given as vectors of symbols for select or group_by.

Unfortunately, the value of what you interpolate does not seem to update with additional calls so this will need some troubleshooting

function f(con, conditions)
   add_interp_parameter!(:conditions, :conditions)
    @chain begin
        db_table(con, :My_table)
        @filter(!!conditions)
        @show_query
    end
end

f(con, "Id == 1")
WITH cte_1 AS (
SELECT *
        FROM My_table
        WHERE Id = 1)  
SELECT *
        FROM cte_1
drizk1 commented 4 months ago

There is a workaround.

So, unfortunately, the way interpolation is currently set up, you cannot write a function and have the new interpolation values ready (run time/compile time issue i think I hope to one day get rid of).

You can, however, use a macro to get around this.

macro f(con, conditions, columns)
    add_interp_parameter!(:conditions, conditions)
    add_interp_parameter!(:columns, columns)

    return quote
       @chain db_table(con, :My_table) begin
        @filter(!!conditions)
        @select !!columns 
        @show_query
       end
    end
end 
@f(con, "ID > 0", "column_test")
SELECT *
        FROM My_table
        WHERE ID > 0)  
SELECT column_test
        FROM cte_1
@f(con, "ID > 1000", name)
WITH cte_1 AS (
SELECT *
        FROM My_table
        WHERE ID > 1000)  
SELECT name
        FROM cte_1
kdpsingh commented 4 months ago

This is a great workaround. Agree this should eventually be fixable.

drizk1 commented 4 months ago

I have been tinkering with the TidierData's parse_interpolation to try to see if I can get a version of it to fit without any luck quite yet.

For now, I can add brief documentation page on this since it's functional (but has limitations) and others may find it useful, but leave this issue open.

I think I will also open an interpolation specific issue outlining what I've done and the challenges/barriers experienced

drizk1 commented 4 months ago

@vituri I found a fully functional work around that will let you update the values and interpolate into filter instatements and correctly and anywhere else.

I wrote a macro called @interpolate (for now with source code below) that properly escapes and allows you to change values and interpolate.I am not fully sure why this works (i think it some brings the scopes onto the same level @kdpsingh probably knows) but it lets you build a macro that so that you can pass arguments to TidierDB chains. Unfortunately TidierDB uses strings from some of the sql logic and it needs access to the expression contents before it begins the quote. This means without a big rewrite this will be the most graceful method for now, I think.

arguments are con which I plan to remove and as many tuples of what you want to interpolate (:var_name, var_value)

Of note: you cannot chain @interpolate and your macro together.

When you have a moment, try this out to see if it works for you. If so I will add docs and can release it.

using Pkg; Pkg.add(url = "https://github.com/TidierOrg/TidierDB.jl#broaden-interpolation-ability")
using TidierDB
con = connect(:duckdb);

macro my_chain(con, conditions, columns)
    return quote
        @chain db_table(con, :My_table) begin
        @filter(Day in [!!conditions])
        @select(!!columns)
        @show_query
       end
    end
end 

col_names = [:mpg, :gas, :wt]
dates = ["2024-12-10", "2024-12-5"]
@interpolate(con, (:conditions, dates), (:columns, col_names));
@my_chain(con, dates, col_names)
WITH cte_1 AS (
SELECT *
        FROM My_table
        WHERE Day in ('2024-12-10', '2024-12-5'))  
SELECT mpg, gas, wt
        FROM cte_1
col_names = [:cyl, :model, :gear]
weeks = ["WEEK 1", "WEEK 2"]
@interpolate(con, (:conditions, weeks), (:columns, col_names));
@my_chain(con, weeks, col_names)
WITH cte_1 AS (
SELECT *
        FROM My_table
        WHERE Day in ('WEEK 1', 'WEEK 2'))  
SELECT cyl, model, gear
        FROM cte_1

here is the source code in case you are curious.

macro interpolate(con, args...)
    exprs = Expr[]
    for arg in args
        if !(arg isa Expr && arg.head == :tuple)
            throw(ArgumentError("Each argument must be a tuple"))
        end
        name, value = arg.args
        push!(exprs, :(esc(add_interp_parameter!(Symbol($name), $((value))))))
    end
    return esc(Expr(:block, exprs...))
end
vituri commented 4 months ago

Because my mind can't understand all this black-magic metaprogramming, I was trying to manipulate the SQL string that is outputed by finalize_query and substitute the vectors "manually" by the corresponding "IN (v1, v2, ..., vn)".

I tried your solution and it works fine outside a function. But what if I want something like that:

macro bd_eventos(con, dias)
    return quote
        @chain db_table(con, :Resumo_frota) begin
        TidierDB.@filter(Day in [!!dias])
        TidierDB.@show_query
       end
    end
end

function bd_eventos(con, dias)
    @interpolate(con, (:dias, dias));

    res = @bd_eventos(con, dias)

    res
end

dias = ["2024-01-04"]
bd_eventos(con, dias)

When should I use @interpolate(con, (:dias, dias))? Inside a function it does not work (that is: the values are not correct; it uses the same old value many times).

drizk1 commented 4 months ago

The extra syntax creates a lot of friction, so im working on a different approach where logic is turned into a function and macros interpolate into that.

I've made some progress. I have it so that regular !! interpolation works like in TidierData in simpler macros (select, group_by, arrange, distinct although it breaks tidy selection). I am trying to figure out filter/mutate/summarize next but theyre a bit trickier. hopefully all be simplified.