JuliaData / SplitApplyCombine.jl

Split-apply-combine strategies for Julia
Other
144 stars 15 forks source link

Even more defaults #12

Closed bramtayl closed 5 years ago

codecov-io commented 5 years ago

Codecov Report

Merging #12 into master will decrease coverage by 41.57%. The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #12       +/-   ##
===========================================
- Coverage   97.89%   56.31%   -41.58%     
===========================================
  Files           9       10        +1     
  Lines         237      412      +175     
===========================================
  Hits          232      232               
- Misses          5      180      +175
Impacted Files Coverage Δ
src/innerjoin.jl 28.57% <0%> (-64.77%) :arrow_down:
src/leftgroupjoin.jl 43.75% <0%> (-56.25%) :arrow_down:
src/group.jl 48.75% <0%> (-51.25%) :arrow_down:
src/combinedims.jl 52.17% <0%> (-47.83%) :arrow_down:
src/map.jl 60.71% <0%> (-39.29%) :arrow_down:
src/invert.jl 67.67% <0%> (-30.86%) :arrow_down:
src/product.jl 70.83% <0%> (-23.62%) :arrow_down:
src/splitdims.jl 72% <0%> (-22.74%) :arrow_down:
src/single.jl 90% <0%> (-10%) :arrow_down:
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 529ced1...75021f3. Read the comment docs.

andyferris commented 5 years ago

This is an interesting one. I had left these off ignored these for now because I can see two good choices here.

  1. identity like you have.
  2. Something to facilitate a "natural" join between two tables, for example so that these project to the common propertynames.

The thing with 1. is that it basically makes innerjoin behave a bit like intersect by default (at least for inputs with distinct values). I'm trying to imagine a use case!

The thing with 2. is that it's quite a lot less generic than the other functions here. It's more Tables.jl centric. But, in that particular use case, it is quite compelling. I could use naturaljoin and the operator for this, so not to affect innerjoin.

Thoughts?

bramtayl commented 5 years ago

What about a match function called ‘same’ (which tests for equality only at shared propertynames)

bramtayl commented 5 years ago

Ok, so here's a start:

import Base: diff_names
function intersect_names(data1, data2)
    data1_names = propertynames(data1)
    data2_names = propertynames(data2)
    Names{diff_names(data1_names, diff_names(data1_names, data2_names))}()
end

default_key(rows1, rows2) = intersect_names(first(rows1), first(rows2))
bramtayl commented 5 years ago

Ok, I added in_common back to LightQuery, and if its useful it might make it into the next version. So you could add innerjoin(left, right) = innerjoin(in_common(first(left), first(right)), left, right). Of course, the default would only work for non-empty iterators.

bramtayl commented 5 years ago

PS the argument order of inner_join is a bit off. Why not make all of these functions keyword arguments instead?

andyferris commented 5 years ago

It’s a good idea. The signature was prototyped in Julia v0.6 so the keyword arguments weren’t fast yet.

I also wonder if we have the same problem with too many positional function arguments in mapreduce...

Of course group and so-on might also benefit from keyword arguments.