Open Lincoln-Hannah opened 2 months ago
Hi @Lincoln-Hannah, sorry for the delay in getting back to you. This is a solid idea - I want to share some initial thoughts on why @mutate()
currently functions the way it does, and how we might get closer to what you are looking for.
Right now, @mutate()
supports the multi-line syntax you propose here but doesn't support situations where one argument relies on a variable that was created in a previous argument. In the above example, c = 3b
relies on the existence of b
, which was created in the previous argument. The functionality as currently implemented is intentional because this limitation comes from DataFrames.transform()
. This is implemented for a performance reason -- namely, that DataFrames assumes that arguments can be parallelized and thus run faster.
There are 2 ways that we could fix this:
@chainwithmutate()
macro you propose above: I don't like the name (because it would be used inside an existing @chain
macro) but we could consider an alternative name like @mutates()
, where the s
makes it look plural and stands for "sequential".@mutate()
macro to analyze the variables being created (e.g., b
and c
) and the variables being used (e.g.,a
and b
) and to automatically run them sequentially in separate calls to DataFrames.transform()
if a dependency is detected.This would be more of a new feature than a bug-fix, so it's slightly lower priority, but I think that option 2 is doable and is something we should pursue.
Option 2 is fine. Thank you for considering it.
@chainWithMutate
would be more difficult to implement. The idea is it can be used instead of a @chain
macro (not sit within one). All the other macros would work within it. But if a line started with variable =
it would be treated as a @mutate
line. I found that 2/3 of the lines I write within a @chain
block are @mutate
lines and often they are interspersed with @filter
and other macros. It would just be cleaner if I didn't have to keep repeating @mutate
.
@chainWithMutate begin
DataFrame( a=1:10 )
b = 2a
@filter b > 10
c = 2b
end
Ah I see what you mean. We probably won't add this macro to the package but it's definitely doable. I can try to put together a code snippet as a starting point if that would be of interest.
Very much so.
I really think if people used it, they would like it.
There are so many @chain
blocks I've written with lots of @mutate
lines interspersed with @filter
@pivot
and @join
lines. Not having to write @mutate
every time would save a lot of code.
Would you consider creating a
@chainWithMutate
macro that has one difference to the standard@chain
macro. If a line begins withvariablename =
and a DataFrame is passed from the line above, then it treats it like line starting with@mutate
So instead of writing;one could just write