Currently the machine learning module covers hierarchical clustering, consensus clustering, and PLIER. We also cover gene identifier conversion, melting wide genomic data, and ggplot2 along the way! I am planning on introducing a new pathway analysis module that will cover gene identifier conversion (#242) and ggplot2 was moved into the introduction to R and the Tidyverse module, although going from wide data to something ggplot2-ready for making a jitter plot or box plot could still be useful to include. scRNA-seq will still cover dimension reduction (sounds like it will be UMAP) and we demonstrate use of DESeq2::plotPCA() in the bulk RNA-seq module.
Here are some thoughts about how this module should be revamped, in no particular order:
We should make a heatmap that includes annotation bars with ComplexHeatmap. Instead of showing the sample-sample relationships (what we currently do), we should demonstrate how we might typically filter genes (e.g., high variance) for display and hierarchical clustering.
ComplexHeatmap seems to be pretty flexible regarding clustering (docs), so it's worth considering how we might progress from an object with clustering to a heatmap for illustrative purposes.
I want to retain consensus clustering. I don't expect every participant will have a use for consensus clustering, specifically, but I do think it illustrates a point worth making re: healthy skepticism around groups/clusters.
When I was first taught clustering, it was introduced alongside low-dimensional representation (LDR; it was almost certainly PCA). It could be useful to compare a scatter plot of PC1 and PC2 to a heatmap with annotation bars.
If we use the pbta-histologies.tsv file, it could be a good opportunity to cover some data cleaning basics, as there's information about the DNA samples we'd need to remove.
Currently the machine learning module covers hierarchical clustering, consensus clustering, and PLIER. We also cover gene identifier conversion, melting wide genomic data, and ggplot2 along the way! I am planning on introducing a new pathway analysis module that will cover gene identifier conversion (#242) and ggplot2 was moved into the introduction to R and the Tidyverse module, although going from wide data to something ggplot2-ready for making a jitter plot or box plot could still be useful to include. scRNA-seq will still cover dimension reduction (sounds like it will be UMAP) and we demonstrate use of
DESeq2::plotPCA()
in the bulk RNA-seq module.Here are some thoughts about how this module should be revamped, in no particular order:
ComplexHeatmap
. Instead of showing the sample-sample relationships (what we currently do), we should demonstrate how we might typically filter genes (e.g., high variance) for display and hierarchical clustering.ComplexHeatmap
seems to be pretty flexible regarding clustering (docs), so it's worth considering how we might progress from an object with clustering to a heatmap for illustrative purposes.pbta-histologies.tsv
file, it could be a good opportunity to cover some data cleaning basics, as there's information about the DNA samples we'd need to remove.