datacarpentry / R-genomics

Lesson on data analysis and visualization in R for genomics
http://datacarpentry.github.io/R-genomics
Other
40 stars 75 forks source link

Add lesson on Bioconductor #67

Open JasonJWilliamsNY opened 7 years ago

JasonJWilliamsNY commented 7 years ago

It's important for folks to know about Bioconductor. Add a lesson where we install and use a simple library, perhaps something to help us parse VCF as in #46

PeteHaitch commented 6 years ago

Preface: I'm adding to this as part of Software/Data Carpentry instructor training.

Perhaps I don't quite understand the purpose of the R-genomics lesson (~or whether it's still being actively developed~ just found 'will become available in June 2018'), but the lack of Bioconductor content also really surprised me. In my mind, and I think many other bioinformaticians, R + genomics = Bioconductor.

I love data frames, dplyr, and other content used in this lesson for general data manipulation and analysis. But not using existing the Bioconductor infrastructure (and extensive teaching materials!) seems a real shame and missed opportunity.

Finally, to mention a more speculative option, there is a dplyr-like 'grammar of genomic data manipulation' being developed that adds a tidyverse-flavour to core Bioconductor data structures (https://github.com/sa-lee/plyranges). While not ready for prime time, it may be useful in the future for learners who are familiar with R (or the at least tidyverse) but not yet with Bioconductor.