TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
515 stars 14 forks source link

Variable not defined when using bang-bang operator in loop #103

Closed tjburch closed 1 year ago

tjburch commented 1 year ago

I've been playing around with the package the last few days and have run into an issue when using the bang-bang operator in a loop. Here's an MWE, where I'm trying to expand out a dataframe to have unique columns for each signal/data pair

using DataFrames
using Tidier

base_data = DataFrame(
    Id = 1:100, 
    Signal = rand(["signal1", "signal2", "signal3"], 100),
    data1 = rand(100),
    data2 = rand(100),
)

df_list = []
for col in [:data1, :data2]
    df_temp = @chain valid_ids begin
        @select(Id, Signal, !!col)
        @pivot_wider(names_from=Signal , values_from = !!col)
    end
    col_names = names(df_temp)[2:end]
    rename_dict = Dict(name => Symbol("$(name)_$(col)") for name in col_names)
    df_temp = rename(df_temp, rename_dict)

    push!(df_list, df_temp)
end

df_input = reduce((x, y) -> @left_join(x, y, :Id), df_list)

Running in interactive mode yields ERROR: LoadError: UndefVarError:colnot defined flagged at the first !!col usage. Is this improper usage of the bang-bang or is there some underlying issue? If I explicitly define col = :data1 and try to run a single instance of the loop, it works, which makes me think it might be a scoping error. Thanks in advance for any clarity.

Using julia 1.9.2 and

  [a93c6f00] DataFrames v1.5.0
  [f0413319] Tidier v0.7.6
kdpsingh commented 1 year ago

Thanks for posting. There are actually two issues here. I've figured out the first one but am still working on the second one. One limitation to bang-bang interpolation as it is currently implemented in Tidier is that it can only access global variables. Because the col variable is defined inside of the for loop as the iterator, it isn't accessible to the bang-bang operator inside of Tidier.

I was partly able to fix this problem by defining a gcol global variable outside of the for loop, assigning the global variable gcol to match col inside of the for loop, and then using !!gcol instead of !!col.

But even this doesn't entirely fix the error (it just results in a different error), so I'm trying to figure out if this is due to a problem within Tidier or a problem with the code.

At the heart of it, this is a variable scoping issue, but the fact that I can't solve it using the global variable workaround is a problem. We need to figure this out.

kdpsingh commented 1 year ago

This is now fixed in 6a71deb. You can’t use !! interpolation inside a for loop to refer to the iterator because the macros get expanded before the loop is run.

Here’s the updated version of your code, which should work once you re-install the package from GitHub. The dev documentation page on interpolation has also been updated to include a simple example of this: https://tidierorg.github.io/Tidier.jl/dev/examples/generated/UserGuide/interpolation/

using DataFrames
using Tidier

base_data = DataFrame(
    Id = 1:100, 
    Signal = rand(["signal1", "signal2", "signal3"], 100),
    data1 = rand(100),
    data2 = rand(100),
)

gcol = Symbol()
df_list = []
for col in [:data1, :data2]
    global gcol = col
    df_temp = @chain base_data begin
        @select(Id, Signal, @eval(Main, gcol))
        @pivot_wider(names_from=Signal , values_from = @eval(Main, gcol))
    end
    col_names = names(df_temp)[2:end]
    rename_dict = Dict(name => Symbol("$(name)_$(col)") for name in col_names)
    df_temp = rename(df_temp, rename_dict)

    push!(df_list, df_temp)
end

df_input = reduce((x, y) -> @left_join(x, y, Id), df_list)