TidierOrg / TidierData.jl

Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
MIT License
86 stars 7 forks source link

Issue with auto-vectorization #93

Closed kdpsingh closed 7 months ago

kdpsingh commented 7 months ago

See here for an example of what we need to fix: https://discourse.julialang.org/t/tidier-creating-random-columns/111220

I think the root cause is an issue with auto-vectorization not working correctly.

These two code blocks should produce the same result:

julia> @chain df begin
         # @group_by(reference)
         @mutate(split = ~rand(10))
         end
10×2 DataFrame
 Row │ reference  split     
     │ Bool       Float64   
─────┼──────────────────────
   1 │      true  0.514911
   2 │      true  0.543143
   3 │     false  0.986756
   4 │     false  0.900573
   5 │     false  0.896504
   6 │      true  0.458367
   7 │      true  0.128537
   8 │      true  0.63249
   9 │     false  0.0360895
  10 │      true  0.636696
julia> @chain df begin
         # @group_by(reference)
         @mutate(split = ~rand(n()))
         end
10×2 DataFrame
 Row │ reference  split 
     │ Bool       Int64 
─────┼──────────────────
   1 │      true     10
   2 │      true     10
   3 │     false     10
   4 │     false     10
   5 │     false     10
   6 │      true     10
   7 │      true     10
   8 │      true     10
   9 │     false     10
  10 │      true     10
kdpsingh commented 7 months ago

This is fixed in #94.