Closed bkamins closed 1 year ago
Base: 96.36% // Head: 96.36% // No change to project coverage :thumbsup:
Coverage data is based on head (
6fc6ae4
) compared to base (69313ee
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
CC @jar @pdeffebach @yjunechoe
I think we discussed this in the previous issue but I forgot---could you remind me why we use keywords here rather than a function? Then it would have Cols(...; operation=any)
and Cols(...; operation=all)
using built in functions any
and all
, and could use user-defined predicates.
We have not discussed it previously. Actually what you propose is a good idea. But any
and all
are not correct functions to be passed. We can say that union
function is the default and intersect
is a function to pass if one wants intersection. However, potentially user could pass any function operating on sets and returning a set as a result.
Then the definition in DataFrames.jl would be:
@inline Base.getindex(x::AbstractIndex, idx::Cols) =
isempty(idx.cols) ? Int[] : idx.operation(getindex.(Ref(x), idx.cols)...)
It would be more error prone than the current implementation (potentially leading to crazy error stack traces) but more flexible indeed. @nalimilan - what do you think?
Makes sense. Then we'd also get setdiff
and symdiff
for free
Yep, I agree with adienes.
Yeah why not. Maybe we need a more specific name than "operation" then for the argument. Again the same discussion as the one about "combine" in unstack
? :-)
AFAICT this fits well. In mathematics union
and intersect
etc. are typically called "operations", c.f. e.g. https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations
I have updated the PR.
Perhaps operator
is very slightly a better choice than operation
for some very tenuous reasons:
Cols
will be indicator/characteristic functions of sets, and a function that acts on other functions is often called an operatoroperator=
should be a verb (i.e. a function)The wikipedia page for the Algebra of Sets uses both terms interchangeably, so I'm not sure which is more standard.
I checked several references, and the border between operation
and operator
is thin. Operation (mathematics) gives a most clear distinction:
An operator is similar to an operation in that it refers to the symbol or the process used to denote the operation, hence their point of view is different. For instance, one often speaks of "the operation of addition" or "the addition operation", when focusing on the operands and result, but one switches to "addition operator" (rarely "operator of addition"), when focusing on the process
Given this explanation indeed operator
seems better, but I am not sure.
especially when the symbols ∩
and ∪
are used instead of intersect
and union
then operator
definitely seems like the better word there
changed
Thank you!
x-ref https://github.com/JuliaData/DataFrames.jl/pull/3224