Closed bleearmstrong closed 8 years ago
@bleearmstrong interesting thought on the grouping-- the best place to discuss that would be in the issues. You should open an issue describing the problem ("How should joins and tidy verbs handle grouped data frames?"), add some background, and perhaps mention the dplyr behavior.
I've adjusted some code so that x >> join(y)
will maintain x
's grouping. Should I add that to this request or a future request? I'd prefer to add it to future request, so I could tackle several issues involved with grouping in a single pull request.
@bleearmstrong A few comments:
Verb
class. The idea is that sometimes we want to handle different key verbs / delayed functions inside the DplyFrame
in different ways. For those verbs, we can use isinstance
to check their type.join
functions into classes which inherit from Verb
. Now you don't need to import the re package and run that on every function that DplyFrame
gets piped on.do_ungroup
code outside the DplyFrame
code. The challenge here is deciding where we want our delayed function code to go-- inside DplyFrame
, or in functions? The downside to putting it in functions is that we repeat ourselves. The downside to putting it in DplyFrame
is that people need to look in a few different spots to understand what's going on. I think I prefer repeating ourselves and putting this code inside join
, count
, and the other functions which will manipulate grouping, but I am open to arguments!I'm probably going to cancel this pull request and move the mutating joins over to verbs before I do work on the filtering verbs.
Add filtering joins. While I was working on implementing spread(), I realized the functions didn't work quite properly on grouped data. As of this pull request, grouping is removed when data is joined. In some cases, this makes sense; we can think of a mutating join as creating a new dataframe, so maybe grouping should be removed. For filtering joins, maybe not. For spread and gather, maybe not. How to deal with grouping should be discussed, not just for joins but for other functions. Should that be discussed here or is there somewhere else that is more appropriate?