Open JasonJWilliamsNY opened 7 years ago
Preface: I'm adding to this as part of Software/Data Carpentry instructor training.
Perhaps I don't quite understand the purpose of the R-genomics lesson (~or whether it's still being actively developed~ just found 'will become available in June 2018'), but the lack of Bioconductor content also really surprised me. In my mind, and I think many other bioinformaticians, R + genomics = Bioconductor.
I love data frames, dplyr, and other content used in this lesson for general data manipulation and analysis. But not using existing the Bioconductor infrastructure (and extensive teaching materials!) seems a real shame and missed opportunity.
Finally, to mention a more speculative option, there is a dplyr-like 'grammar of genomic data manipulation' being developed that adds a tidyverse-flavour to core Bioconductor data structures (https://github.com/sa-lee/plyranges). While not ready for prime time, it may be useful in the future for learners who are familiar with R (or the at least tidyverse) but not yet with Bioconductor.
It's important for folks to know about Bioconductor. Add a lesson where we install and use a simple library, perhaps something to help us parse VCF as in #46