TheTS / palmsplusr

palmsplusr: PALMS post-processing
http:/thets.github.io/palmsplusr/
GNU Lesser General Public License v3.0
5 stars 1 forks source link

updating palmsplusr to align with change in dplyr #15

Open vincentvanhees opened 1 year ago

vincentvanhees commented 1 year ago

Hi Tom, with the latest version of dplyr and running palmsplusr I get the error:

Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped
  data frame and adjust accordingly.

I think this means that we need to replace summarise() for reframe(). You are probably more familiar with the tidyverse jargon. Do you know what we should do with the last part of the warning? Does this mean we need to group our data and if so how do I do that?

TheTS commented 1 year ago

Hi Vincent

Ah yes, I saw they recently added reframe . I have used summarise on a grouped dataframe in several places, but I'm not sure which one would cause that warning message. One example is in trajectories:

  data %>%
    filter(tripnumber > 0) %>%
    group_by(identifier, tripnumber) %>%
    summarise(!!!args, do_union = FALSE) %>% # This line may need to be changed
    st_cast("LINESTRING") %>% 
    mutate(!!!args_after) %>%
    mutate_if(is.logical, as.integer)

In the code above, since the summarise function is used on a grouped dataframe, the dataframe returned by summarise is still grouped by identifier and tripnumber. This means the st_cast function is also applied to a dataframe grouped by identifier and tripnumber.

If summarise is changed to reframe, then I assume the dataframe needs to be regrouped before it is fed into the rest of the pipeline. Something like:

  data %>%
    filter(tripnumber > 0) %>%
    group_by(identifier, tripnumber) %>%
    reframe(!!!args, do_union = FALSE) %>% # changed to reframe()
    group_by(identifier, tripnumber) %>% # regroup the dataframe
    st_cast("LINESTRING") %>% 
    mutate(!!!args_after) %>%
    mutate_if(is.logical, as.integer)

I assume this also applies to summarise_at and summarise_all, which are also used in several places. These functions were actually superseded a while ago in favour of across, but they still work.

Edit: I should also say that it looks like reframe is only used when summarise returns more/less than one row per group. I guess the first step is figuring out when that occurs (and then figuring out if a grouped dataframe is involved).

vincentvanhees commented 1 year ago

Thanks for the quick reply Tom. Ok, I found one issue with the pre-processing, which is not part of palmsplusr itself (code you drafted for Line and which I then re-used for HabitusGUI). There, replacing summarise() by reframe() seemed sufficient. Next, I got stuck with a problem with the columnnames of the PALMS output that were shifted by one cell, at least not related to palmsplusr.

I am now able to run palmsplusr again, and will have to review that the output is correct. Also, I will have a look whether I can identify a way to address the warnings about the deprecated functions.