TidierOrg / TidierData.jl

Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
MIT License
86 stars 7 forks source link

I cannot convert Dates.values with TidierData anymore #86

Closed awojtas-r closed 8 months ago

awojtas-r commented 8 months ago

After updating to the newest TidierData (v0.14.6) I cannot convert dates to integers anymore (although it's working good with 0.10 on another machine). Having a simple dataframe with text and nanoseconds, I'd like to create additional column which converts ns to Int but that throws following errors:

df = DataFrame(
        text = ["one","two","three"],
        ns = [Dates.Nanosecond(10), Dates.Nanosecond(20), Dates.Nanosecond(30)]
)

@chain df begin 
    TidierData.@mutate(int =  Dates.value(ns))
end

Error (only few rows since it's a long list:

MethodError: no method matching hasproperty(::Module, ::Expr)

Closest candidates are:

hasproperty(::Any, !Matched::Symbol)

@ Base reflection.jl:2095

(::TidierData.var"#7#8")(::Expr)@parsing.jl:391
walk@utils.jl:135[inlined]
postwalk@utils.jl:145[inlined]
parse_escape_function@parsing.jl:349[inlined]
var"#parse_function#2"(::Bool, ::Bool, ::typeof(TidierData.parse_function), ::Symbol, ::Expr)@parsing.jl:138
var"#parse_tidy#1"(::Bool, ::Bool, ::Bool, ::Bool, ::typeof(TidierData.parse_tidy), ::Expr)@parsing.jl:41
TidierData@parsing.jl:2[inlined]

Could you please advice?

kdpsingh commented 8 months ago

Thanks, I will take a look and we will fix.

I'm pretty sure I know what's happening here but didn't think to test this scenario in my last update. So it's perfectly possible that this used to work before but doesn't work now.

drizk1 commented 8 months ago

In the meantime, this can serve as a temporary work around.

test = Dates.value.(df.ns)
@chain df begin 
  TidierData.@mutate(int =  !!test)
end
3×3 DataFrame
 Row │ text    ns              int   
     │ String  Nanoseco…       Int64 
─────┼───────────────────────────────
   1 │ one     10 nanoseconds     10
   2 │ two     20 nanoseconds     20
   3 │ three   30 nanoseconds     30
kdpsingh commented 8 months ago

I was able to fix this, but in the midst of fixing this, I discovered another edge case where the solution I had put in place in the prior commit doesn't quite work.

As I was fixing edge cases, I discovered that it's relatively straightforward to implement a full-blown dynamic scoping solution for column names that will fix every possible edge case and will ensure consistent behavior.

I am part of the way through implementing this. Stay tuned.

kdpsingh commented 8 months ago

Ok, going down this rabbit hole, I've realized it will take a bit of work to ensure that my new scoping code works consistently across all TidierData macros.

So rather than try to implement it, I'm going to put in place a simple fix so that this example works correctly. Should have this done in the next 2-3 days.

kdpsingh commented 8 months ago

This is now fixed. Here's the output in the latest version:

julia> df = DataFrame(
               text = ["one","two","three"],
               ns = [Dates.Nanosecond(10), Dates.Nanosecond(20), Dates.Nanosecond(30)]
       )
3×2 DataFrame
 Row │ text    ns             
     │ String  Nanoseco…      
─────┼────────────────────────
   1 │ one     10 nanoseconds
   2 │ two     20 nanoseconds
   3 │ three   30 nanoseconds

julia> @chain df begin 
           TidierData.@mutate(int =  Dates.value(ns))
       end
3×3 DataFrame
 Row │ text    ns              int   
     │ String  Nanoseco…       Int64 
─────┼───────────────────────────────
   1 │ one     10 nanoseconds     10
   2 │ two     20 nanoseconds     20
   3 │ three   30 nanoseconds     30

Pushing to the registry shortly.