Date/Time: Nov 7th 9am - Nov 8th (midnight)
Room: MR5 3005
Datapalooza: Nov 9th-10th (Th-Fr)
Single-cell RNA-seq (scRNA-seq) has now become routine, and there are hundreds of published datasets of single-cell RNA-seq data in various biological systems. One of the key applications of scRNA-seq is to separate cell types. For example, studies may identify novel cell types that previously have been hidden due to population averaging; or, some studies use data to classify individual cells as cancer cells or normal cells, leading to cleaner expression profiles.
A typical analysis uses an unsupervised clustering (like t-SNE, PCA, or MDS) to visualize the high-dimensional expression data and identify clusters of individual cells. These clusters are then annotated post-hoc based on their gene expression patterns. Most effort has been in unsupervised analysis of scRNA-seq data, and the features that define the differences in classification are then derived from the factors used in the cluster, but a supervised approach could provide a better way to identify salient features.
Therefore, we are now interested in applying supervised machine learning methods to build models that can classify individual cells into cell types. These models will be useful for (at least) two potential downstream applications:
They could span multiple data sets and thereby build a pan-cell-type predictor that could be fed new scRNA-seq data from a new experiment and be used to classify known cell types.
They will provide a novel look at the feature set that defines a cell type, which is not revealed by the unsupervised methods.
We should seek to build a reproducible piece of software that will enable others with scRNA-seq data to either re-run our analysis to build a predictor for the new dataset, or to use the predictor we have built to classify newly sequenced single cells.
Each of you should be a member of the bds_tg
group on Rivanna. We have an allocation of disk space on Rivanna at /sfs/lustre/allocations/bds_tg
. I suggest everyone set an environment variable to point to this for easy communication:
export BDSDATA="/sfs/lustre/allocations/bds_tg"
You should also have access to a compute credit allocation, also called bds_tg
(use allocations
to see yours).
Please commit any code into this repository.
Some planning ideas (from aakrosh):
Maching learning links:
scRNA analysis:
scRNA data:
Notes on approaches taken: