Closed bkamins closed 4 years ago
CC @pdeffebach, @CameronBieganek
To clarify the issue is as follows, right?
function foo(x, y)
x .+ y .- mean(x)
end
select(df, Cols(:x1, :x1) => foo)
errorsselect(df, Cols(:x1, :x1) => foo)
works. My intuition is to throw an error. Given that Cols
promises no guarantee of ordering in the vast majority of scenarios, the user shouldn't rely on a 1:1 behavior of Cols
in this context. They can do
select(df, [:x1, :x1] => foo)
If they want, right?
@pdeffebach No, the idea is that if it didn't error, Cols(:x1, :x1)
would be equivalent to [:x1]
(contrary to [:x1, :x1]
).
As usual, we can throw an error for now and relax the rules later if needed.
Sorry, I meant the error at the call of foo
, which requires two arguments.
The behavior you described is the correct behavior. I believe Cols(:x1, :x1)
should be equivalent to [:x1]
.
I believe
Cols(:x1, :x1)
should be equivalent to[:x1]
.
What is the benefit of allowing this over throwing an error?
I guess it could make sense if these are stored in variables which can take multiple values. Though half of the time it would probably indicate a bug and you can always call unique
first.
What is the benefit of allowing this over throwing an error?
I think it would be frustrating for users if
Cols(x #= Container which contains :x1 =#, y #= Container which contains :x1=#) # Between(), etc.
worked but not
Cols(:x1, :x1)
half of the time it would probably indicate a bug
This is my understanding of the situation, but given there is no support for it it is even easier to have Cols
behave by always taking a union.
Cols(:x1, :x1) == Cols(:x1)
makes sense to me. That's consistent with the
Cols(A, B, C) ==> (A, B\A, C\(A ∪ B))
behavior I mentioned in the other thread.
It is consistent when A
, B
, C
are sets. I considered :x1
to be a separate case as it is not a set but a set element.
I'm willing to consider :x1
to be a set with one element. I'd hate to have to write [:x1]
all the time. 🙂
I don't have a strong opinion here, but I would say it's fine to allow it. If one of them is not a symbol literal, this could be useful (say Cols(:x1, col)
and col
is set programmatically).
The only analogous thing I can think of is for example repeated named arguments in julia Base, and there they are only forbidden if they are passed "not programmatically":
julia> (; a = 1, a = 2)
ERROR: syntax: field name "a" repeated in named tuple
Stacktrace:
[1] top-level scope at REPL[1]:1
[2] eval(::Module, ::Any) at ./boot.jl:331
[3] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
[4] run_backend(::REPL.REPLBackend) at /home/pietro/.julia/packages/Revise/WkyNB/src/Revise.jl:1023
[5] top-level scope at none:0
julia> (; a = 1, :a => 2)
(a = 2,)
As we do not have a good way to test if the symbol was a literal or came from some variable, the simplest is probably to just allow it.
We had a discussion with @nalimilan how exactly
Cols
should work.The question is if:
should be an allowed selection rule.
What I mean is that clearly things like:
should be allowed (taking a union), but should we allow taking a union of single column selectors.
As a corner case:
would be allowed (as
[:x1]
selects multiple columns - just in this case it is just a single column).The rationale is that it is unlikely that:
is an intended thing (probably this is an error on the user side).
@piever - do you have any opinion on this?