bkamins / Julia-DataFrames-Tutorial

A tutorial on Julia DataFrames package
MIT License
531 stars 119 forks source link

Buggy example? #37

Closed jjgomezcadenas closed 1 year ago

jjgomezcadenas commented 1 year ago

Hello and thanks for these wonderful tutorials.

I was trying to run the above example (I use Pluto, julia 1.9 and DataFrames 1.5.0)

x = DataFrame(id=rand('a':'d', 100), v=rand(100))
combine(groupby(x, :id)) do sdf
    n = nrow(sdf)
    n < 25 ? DataFrame() : DataFrame(n=n) # drop groups with low number of rows
end

And got the following error:

UndefVarError: nrow not defined

_combine(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Vector{Bool}, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:739 _combine_prepare_norm(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:86 var"#_combine_prepare#671"(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(DataFrames._combine_prepare), ::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Base.RefValue{Any})@splitapplycombine.jl:51 _combine_prepare@splitapplycombine.jl:25[inlined]

combine#737@splitapplycombine.jl:845[inlined]

combine@splitapplycombine.jl:845[inlined]

combine#735@splitapplycombine.jl:830[inlined]

combine@splitapplycombine.jl:824[inlined] top-level scope@Local: 1[inlined]

The I tried to fix it:

combine(groupby(x, :id)) do sdf
    n = size(sdf)[1]
    n < 25 ? DataFrame() : DataFrame(n=n) # drop groups with low number of rows
end

And got this.

UndefVarError: DataFrame not defined

_combine(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Vector{Bool}, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:739 _combine_prepare_norm(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:86 var"#_combine_prepare#671"(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(DataFrames._combine_prepare), ::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Base.RefValue{Any})@splitapplycombine.jl:51 _combine_prepare@splitapplycombine.jl:25[inlined]

combine#737@splitapplycombine.jl:845[inlined]

combine@splitapplycombine.jl:845[inlined]

combine#735@splitapplycombine.jl:830[inlined]

combine@splitapplycombine.jl:824[inlined] top-level scope@Local: 1[inlined]

The example if of particular interest for me, since my problem is precisely to drop groups with less than certain number of rows

Thanks!

jjgomezcadenas commented 1 year ago

Following on my previous post, this worsk:

begin
grouped_df  = groupby(x, :id)
combine(grouped_df, x -> size(x)[1] < 2 ? DataFrame() : x)
end

but not the original example in:

https://stackoverflow.com/questions/66484426/remove-groups-by-condition

which used nrow(x) rather than size(x)[1]

bkamins commented 1 year ago

Can you please double-check that you have Pluto properly configured? I just run the example in REPL and it works as expected:

julia> combine(groupby(x, :id)) do sdf
           n = nrow(sdf)
           n < 25 ? DataFrame() : DataFrame(n=n) # drop groups with low number of rows
       end
2×2 DataFrame
 Row │ id    n
     │ Char  Int64
─────┼─────────────
   1 │ d        31
   2 │ c        29
jjgomezcadenas commented 1 year ago

So, it may be a bug with Pluto. I tried in the REPL and it works. But in a fresh Pluto notebook with just these dependences

begin
    import Pkg
    Pkg.activate(mktempdir())
    Pkg.add([Pkg.PackageSpec(name="DataFrames", version="1.5.0")])
    using DataFrames
end

it does not work

x = DataFrame(id=rand('a':'d', 100), v=rand(100))

combine(groupby(x, :id)) do sdf
    n = nrow(sdf)
    n < 25 ? DataFrame() : DataFrame(n=n) # drop groups with low number of rows
end

UndefVarError: nrow not defined

_combine(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Vector{Bool}, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:754 _combine_prepare_norm(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:86 var"#_combine_prepare#701"(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(DataFrames._combine_prepare), ::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Base.RefValue{Any})@splitapplycombine.jl:51 _combine_prepare@splitapplycombine.jl:25[inlined]

combine#767@splitapplycombine.jl:860[inlined]

combine@splitapplycombine.jl:860[inlined]

combine#765@splitapplycombine.jl:845[inlined]

combine@splitapplycombine.jl:839[inlined] top-level scope@Local: 1[inlined]

combine(groupby(x, :id)) do sdf
    n = size(sdf)[1]
    n < 25 ? DataFrame() : DataFrame(n=n) # drop groups with low number of rows
end

UndefVarError: DataFrame not defined

_combine(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Vector{Bool}, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:754 _combine_prepare_norm(::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Vector{Any}, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool)@splitapplycombine.jl:86 var"#_combine_prepare#701"(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(DataFrames._combine_prepare), ::DataFrames.GroupedDataFrame{DataFrames.DataFrame}, ::Base.RefValue{Any})@splitapplycombine.jl:51 _combine_prepare@splitapplycombine.jl:25[inlined]

combine#767@splitapplycombine.jl:860[inlined]

combine@splitapplycombine.jl:860[inlined]

combine#765@splitapplycombine.jl:845[inlined]

combine@splitapplycombine.jl:839[inlined] top-level scope@Local: 1

jjgomezcadenas commented 1 year ago

OK, I removed and reinstall Pluto, recompiled everything, this time works. Sorry about the hassle and thanks!

bkamins commented 1 year ago

Thank you! Still @fonsp might know about this issue, as maybe this is some general problem in Pluto.jl.

fonsp commented 1 year ago

What Pluto version are you using? I cannot reproduce https://github.com/bkamins/Julia-DataFrames-Tutorial/issues/37#issuecomment-1548322647 on latest Pluto, Julia 1.9.0

Revolutionary program 25.html.zip

jjgomezcadenas commented 1 year ago

I am using now Julia 1.9 latest version of Pluto. Under this conditions, the code runs. Previously I may had some incompatibility, I have moved to 1.9 but not reinstalled Pluto. What I did was to remove Pluto, install it again, precompile and it works now