JuliaData / DataFramesMeta.jl

Metaprogramming tools for DataFrames
https://juliadata.github.io/DataFramesMeta.jl/stable/
Other
480 stars 55 forks source link

suggestion - @chainWithRTransform #325

Open Lincoln-Hannah opened 2 years ago

Lincoln-Hannah commented 2 years ago

A macro similar to @chain but treats any line that isn't another macro as being within an @rtransform @astable block. So what would currently be written as:

@chain DataFrame( A = 1:10 ) begin

    @rtransform @astable begin
        :B =  mod(:A,3)
        :E =  :B * 2
    end

    @rsubset :B == 1

    @rtransform @astable begin
        :F =  :B * 2
        :G =  :F + 1
    end

    @orderby :A

    @rtransform   :H =  :G * 2

end

Could be written as:

  @chainWithrTransform DataFrame( A = 1:10 ) begin

        :B =  mod( :A, 3 )
        :E =  :B * 2

        @rsubset :B == 1

        :F =  :B * 2
        :G =  :F + 1

        @orderby :A

        :H =  :G * 2

  end
pdeffebach commented 9 months ago

I'm coming around to this being a good idea. It would certainly cut down on lots of typing and it definitely seems to be true that 90% of commands are @rtransform.

Maybe @nalimilan can chime in and give their thoughts. Because this would be a pretty non-standard syntax transformation.

bkamins commented 9 months ago

How would it combine with grouping/ungrouping data frames in the process (I think it would be OK, but I want to make sure)

nalimilan commented 9 months ago

Technically, would it be possible to support passing macro calls like @rsubset or @orderby inside @rtransform df begin... end, so that we don't need a new macro like @chainWithrTransform?

That would make https://github.com/JuliaData/DataFramesMeta.jl/pull/376/ a bit less ad-hoc.

pdeffebach commented 9 months ago

@bkamins I think it would only apply to row-wise operations, so grouping would in general be ignored.

@nalimilan I'm not sure how that would work, are you saying something like

@chain_r_transform df begin
    :y = :x * 2
    @rsubset ...
    @orderby ...
end

or something else?

I also disagree what #376 (@when) is that ad-hoc. It honestly seems like one of the simpler ways to give the complication functionality that imitates Stata's if.

Lincoln-Hannah commented 8 months ago

@bkamins Maybe a better way to think of it:

  1. First change any line of the form :ColumnName = ... to @rtransform ColumnName = ....
  2. Then proceed as per a usual @chain block.

So this

  @chainWithrTransform DataFrame( A = 1:10 ) begin

        :B =  mod( :A, 3 )
        :E =  :B * 2

        @rsubset :B == 1

        :F =  :B * 2
        :G =  :F + 1

        @orderby :A

        :H =  :G * 2

  end

becomes

  @chain DataFrame( A = 1:10 ) begin

        @rtransform :B =  mod( :A, 3 )
        @rtransform :E =  :B * 2

        @rsubset :B == 1

        @rtransform :F =  :B * 2
        @rtransform :G =  :F + 1

        @orderby :A

        @rtransform :H =  :G * 2

  end

Its just a bit of Syntax Sugar As @pdeffebach says, 90% of lines are @rtransform.
It allows you to not have to keep writing it.

bkamins commented 8 months ago

So the first assignment operation present in the block drops grouping.

Lincoln-Hannah commented 8 months ago

Lines of the form :columnName = ... are changed to @rtransform :columnName= ... But not if they are within a sub-block (.e.g. @by or @transform )

@chainWithrTransform begin

    DataFrame(A=1:4)

    :B = mod(:A,2)

    @by :B   begin
           :A = mean(:A)          
    end

    :C = :B / :A

    @transform :sumA = sum(:A) 

end

becomes

@chain begin

    DataFrame(A=1:4)

    @rtransform :B = mod(:A,2)

    @by :B  begin 
            :A = mean(:A)                #unchanged since it is within the @by block
    end

    @rtransform :C = :B / :A

    @transform :sumA = sum(:A)          #unchanged since it is within the @transform block
end
pdeffebach commented 8 months ago

So the first assignment operation present in the block drops grouping.

Yeah. It would drop grouping.

Lincoln-Hannah commented 3 months ago

Are you guys still interested in implementing this? Just realised again how many times I write @rtransform