TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
524 stars 14 forks source link

Variables defined outside of the macro do not work inside macros #39

Closed aramirezreyes closed 1 year ago

aramirezreyes commented 1 year ago

Hi!

Thanks for the package! I took it for a spin this morning and noticed variables outside of the scope of the macro are not correctly resolved inside macros. This includes constants like pi.

using DataFrames, Tidier

a = DataFrame(A = 1:5, B = 'a':'e')
hehe = 2
@chain a begin
    @mutate c = hehe*A
end

Which throws:

ArgumentError: column name "hehe" not found in the data frame; existing most similar names are: "A" and "B"

Stacktrace:
  [1] lookupname

This also happens with global constants.

using DataFrames, Tidier

a = DataFrame(A = 1:5, B = 'a':'e')
@chain a begin
    @mutate c = pi*A
end
using DataFrames, Tidier
global toto=5
a = DataFrame(A = 1:5, B = 'a':'e')
@chain a begin
    @mutate c = toto*A
end

Versions:

Julia Version 1.9.0-beta4
Commit b75ddb787ff (2023-02-07 21:53 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 256 × AMD EPYC 7713 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 9 on 256 virtual cores
Environment:
  LD_LIBRARY_PATH = /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/math_libs/11.7/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/extras/CUPTI/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/extras/Debugger/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/nvvm/lib64:/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/lib64:/opt/cray/pe/papi/6.0.0.16/lib64:/opt/cray/pe/gcc/11.2.0/snos/lib64:/opt/cray/libfabric/1.15.2.0/lib64
  JULIA_REVISE_POLL = 1
  JULIA_NUM_THREADS = 8

and

[f0413319] Tidier v0.4.1
kdpsingh commented 1 year ago

Thanks for catching this. Tidy expressions make this tricky because the package doesn't explicitly use symbols to refer to columns of data.

I can think of a few ways to handle global constants. Let me review and will reply with a proposed solution that works within the context of the package.

aramirezreyes commented 1 year ago

Just updated the issue because it seems that all the variables outside of the macro are not resolved

aramirezreyes commented 1 year ago

Ok, reconsidering, I think I may close this if you agree as interpolation sorry for the noise!

kdpsingh commented 1 year ago

Update: I had an older version of Tidier. Interpolation works as intended!

While interpolation should work (at least for user-defined globals), it seems like it's not working in this case. That said, interpolation was originally envisioned to handle interpolation of column names from symbols, although I recently extended it to cover values.

Here's a way that will work for both built-in globals (like pi) and user-defined globals.

using Tidier

global toto=5

a = DataFrame(A = 1:5, B = 'a':'e')

@chain a begin
    @mutate C = Main.toto * A
end

@chain a begin
    @mutate C = Main.pi * A
end

I'll close this issue, but if that doesn't work let me know. At some point, I'll revisit interpolation as a potential solution to this because I agree that interpolation should work.

The key thing here is that tidy expressions by default refer to column names, so to refer to variables outside of the data frame, you need to prefix them with the environment (Main.A) or use interpolation (although admittedly that fails here, which I will look at).

You are likely aware but I'll add here for others who may come across this issue. In the code above, the word global is redundant. Since it's defined outside of a function or package module, it's automatically a global variable in the Main module.

kdpsingh commented 1 year ago

Actually, before closing, I think this is a great example of something we should add to the documentation. Will close this out after the docs are updated.

aramirezreyes commented 1 year ago

Sounds good. Thanks!

El mar. 6, 2023, a las 10:37 a.m., Karandeep Singh @.***> escribió:

Actually, before closing, I think this is a great example of something we should add to the documentation. Will close this out after the docs are updated.

— Reply to this email directly, view it on GitHub https://github.com/kdpsingh/Tidier.jl/issues/39#issuecomment-1456737764, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVLG6ML64D4JXWQ3VLW4S3W2YVFJANCNFSM6AAAAAAVROCRU4. You are receiving this because you authored the thread.

kdpsingh commented 1 year ago

I figured out why I was getting an error with !! interpolation. I had the wrong version of Tidier!

Note to self: Pkg.add() doesn't automatically update the package but Pkg.update() does. So yes, both of your examples should work using !!toto or !!pi.

I'm still going to present the Main.pi example in the documentation because it may be useful as an alternative. But otherwise I'm glad that this works as designed.