davidcarslaw / openair

Tools for air quality data analysis
https://davidcarslaw.github.io/openair/
GNU General Public License v2.0
305 stars 113 forks source link

PercentileRose Error: Summarise Function Depreciated #368

Closed Cuevman81 closed 1 year ago

Cuevman81 commented 1 year ago

When running percentile rose in R Studio.. percentileRose(aq_MetHourly_DeSoto, pollutant = "pm25")

depreciation of summarise.

Following Error:

percentileRose(aq_MetHourly_DeSoto, pollutant = "pm25") Warning message: Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0. ℹ Please use reframe() instead. ℹ When switching from summarise() to reframe(), remember that reframe() always returns an ungrouped data frame and adjust accordingly. ℹ The deprecated feature was likely used in the openair package. Please report the issue at https://github.com/davidcarslaw/openair/issues. This warning is displayed once every 8 hours.


Brief description of the problem

# insert reprex here
jack-davison commented 1 year ago

Hi Rodney,

Thanks for flagging this. This is due to an update in {dplyr} and should be an easy thing for us to sort.

The function is still working as intended despite this warning message, so for the time being you can feel confident in running percentileRose() till its fixed.

Thanks, Jack

mooibroekd commented 1 year ago

@jack-davison I recently encountered the same warning message. Typically summarise() is used with a group_by() on the data beforehand. While the deprecation message suggest the use of reframe() I found that using the newly introduced .by parameter of summarise() also seems to fix this issue. Obviously the group_by() is not needed anymore when using .by.

I also found it peculiar that this message surfaces when more grouping variables are used, while the behavior in this case (regarding more rows) is exactly what you want.

jack-davison commented 1 year ago

Morning @Cuevman81 - can you give it another go on the dev version?

I think I found the specific instance of summarise() that was causing the issue. In {dplyr} v1.0.0 they allowed summarise() to return more than one row, we swapped over a load of code to use this new syntax, and then in v1.1.0 they walked it all back.

reframe() is almost a drop-in replacement, but it ungroups the data by default. @mooibroekd I've not tried using the .by to get around this issue, but I've kept the group_by() just for consistency throughout the rest of {openair} for now. It may be something that we explore in a bigger refactor, as the .by syntax is a lot neater (and removes the issue of having grouped data floating around doing odd things).

mooibroekd commented 1 year ago

@jack-davison Thanks! I also think the .by syntax is neater and it also makes it clear that the grouping is only needed for the summarize() or any other function executed after the initial grouping of data. So in a sense it is more readable.

Cuevman81 commented 1 year ago

Hey @jack-davison sure.. How do I access the dev version? Just need to install in my R Studio? Or just run the dev r script? Thanks!

jack-davison commented 1 year ago

Apologies! You can install the development version of openair using the {pak} package:

install.packages("pak")
pak::pak("davidcarslaw/openair")
Cuevman81 commented 1 year ago

@jack-davison Got it.. Good to go.. Installed the dev version.. ran > percentileRose(aq_MetHourly_DeSoto, pollutant = "pm25") No errors! Thanks so much on the quick work!

jack-davison commented 1 year ago

Great to hear, and thanks for checking it for us!