This pr adds most of the functionality of separate from tidyr. There are a few differences (there are probably more than I'll list here, but these are some that I'm aware of). Otherwise, the general behavior should be consistent with tidyr's behavior.
Negative indexing is not allowed (currently)
On a grouped dataframe, if the variable to be split is used as a grouping variable, and the variable is removed, tidyr will throw an error; this version warns the user that the grouping variable will be removed, both from the dataframe and from grouping, but the other grouping variables will still be applied.
left filling is slow. In tidyr, it doesn't seem to make much of a difference, but for this pr, as a benchmark, doing a right fill with the diamonds dataset takes about .7 seconds, and left fill takes about 25 seconds (still, it does about 2000 rows per second, so it's not horrible performance, but I'd like to improve it. I believe this is a consequence of using apply, which can be slow in some cases) Also, I'm not sure how popular left-filling would be in the first place, right filling seems more common. We may want to wait until we can speed this up before merging.
if tidyr, if you pass a non-string column, it doesn't tell you that it's not a character column, and returns blank columns where you specify the new columns. In this, since it uses pandas methods, it returns a pandas error (it would be easy to modify it to throw an error specific to dplython, but I wasn't sure how much error handling should be done specific to dplython)
This pr adds most of the functionality of
separate
fromtidyr
. There are a few differences (there are probably more than I'll list here, but these are some that I'm aware of). Otherwise, the general behavior should be consistent withtidyr
's behavior.tidyr
will throw an error; this version warns the user that the grouping variable will be removed, both from the dataframe and from grouping, but the other grouping variables will still be applied.tidyr
, it doesn't seem to make much of a difference, but for this pr, as a benchmark, doing a right fill with the diamonds dataset takes about .7 seconds, and left fill takes about 25 seconds (still, it does about 2000 rows per second, so it's not horrible performance, but I'd like to improve it. I believe this is a consequence of usingapply
, which can be slow in some cases) Also, I'm not sure how popular left-filling would be in the first place, right filling seems more common. We may want to wait until we can speed this up before merging.tidyr
, if you pass a non-string column, it doesn't tell you that it's not a character column, and returns blank columns where you specify the new columns. In this, since it uses pandas methods, it returns a pandas error (it would be easy to modify it to throw an error specific to dplython, but I wasn't sure how much error handling should be done specific to dplython)