b-rodrigues / rap4all

https://raps-with-r.dev/
Other
96 stars 31 forks source link

Indirection and tidyselect #48

Open JT-39 opened 1 year ago

JT-39 commented 1 year ago

Hi Bruno,

Have you thought about updating your scripts which are prepared for inflation by fusen to be concurrent with the syntax used to deal with indirection and tidyselect. This allows you to access the variables directly within the pipe, i.e., to filter on the column locality, you need to call it using .data$locality. An example from your code is:

make_commune_level_data <- function(flat_data){
  flat_data |> 
    filter(!grepl("nationale|offres", **.data$**locality),
           !is.na(**.data$**locality))
}

Without this, when inflating, there are many warnings which appear telling you either:

 make_country_level_data: no visible binding for global variable
    ‘locality’
  Undefined global functions or variables:
    locality

Or, in a tidyselect you cannot use .data$ and instead just enclose the variable in "", or you get the warning:

Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. Please use

I am slightly unsure about whether this is best practice and it doesn't seem to be too clear online. But I thought I would share as a food for thought as by doing so removes all the warnings. If you are interested, I am happy to share my code to save you time.

Here are some links: https://dplyr.tidyverse.org/articles/programming.html#indirection https://community.rstudio.com/t/use-of-data-in-tidyselect-expressions-is-now-deprecated/150092

b-rodrigues commented 1 year ago

Yep you’re right that there are these annoying messages when inflating, but I didn’t want to go too much into detail in the book, because the main subject is not package development. These warnings are not an issue if you’re building your analysis as a pakage to take advantage of all the different tools that exist for development. I didn’t want to explain concepts that are not that easy to grasp and that not really required.

JT-39 commented 1 year ago

I thought might be the case. Just out of interest, do you know what the best practice is? (Is it to use .data$ and ""?)

asadow commented 12 months ago

The data.table team prefer to define NULL values as written here:

make_commune_level_data <- function(flat_data){
  locality <- NULL
  flat_data |> 
    filter(!grepl("nationale|offres", locality),
           !is.na(locality))
}