AlexsLemonade / training-modules

A collection of modules that are combined into 1-5 day workshops on computational topics for the childhood cancer research community.
Other
61 stars 27 forks source link

Proposal: modernize R/tidyverse usage #598

Closed jashapiro closed 3 months ago

jashapiro commented 1 year ago

We have all been around a while, so we are using a few "old" conventions when writing our R code. I wonder if it isn't time to try to adopt the current practices in a few places.

Here are my current thoughts on changes we could make:

Most of these changes are relatively small and should be quick to implement. If we do this, I think it makes sense to pilot this in the advanced single cell notebooks, and update all the others as we next teach them.

If there are other modernizations we should think about, please add them as comments below.

jashapiro commented 1 year ago

Tagging @allyhawkins & @sjspielman for comments...

I think we seem to be moving toward the vars() in advanced single cell, but we should decide on |> as soon as possible so we can all get in the right habit!

allyhawkins commented 1 year ago

I am on board with moving to vars and also using the |> pipe. Although I have a terrible time remembering to use it in scpcaTools, maybe using it in training will help fix that.

I don't think we need to change select? I thought you could use helpers inside the function, but you don't need to?

I have no strong feelings about purrr so I'll leave that one up to you guys.

sjspielman commented 1 year ago

I'm on board with all of this.

@allyhawkins the all_of stuff is needed, but it didn't used to be so newer dplyr versions complain. I assume that's what we mean here?

Part of the immediate purrr decision is whether we want to bump renv for this training, or wait for next?

jashapiro commented 1 year ago

@allyhawkins the all_of stuff is needed, but it didn't used to be so newer dplyr versions complain. I assume that's what we mean here?

Yes, that is what I am referring to. It really only matters in cases where you have an external vector of the columns you want select: dplyr::select(my_cols) used to work just fine, but now you need dplyr::select(all_of(my_cols)). I don't know if we use that anywhere, but I kind of suspect we do somewhere...

jashapiro commented 1 year ago

Part of the immediate purrr decision is whether we want to bump renv for this training, or wait for next?

I think these are independent. We can use the \(x) syntax with the current version or ~ with 1.0: it is only the examples/recommendations that have changed.

sjspielman commented 1 year ago

I think we mostly ended up taking care of this. We have done:

We probably want to come back and get rid of the vector/quotes here, though, and just rely on the magic of tidyeval. Low stakes and not going to be updated for June 2023, but next time! So I'll leave this issue open. https://github.com/AlexsLemonade/training-modules/blob/bbcc758f573f60e8967bc80936519740df3a7977/scRNA-seq/03-normalizing_scRNA.Rmd#L85-L91

jashapiro commented 3 months ago

I think this issue can be considered pretty well closed. It has at least outlived its usefulness as an issue.