Page 2: Introduction: Paragraphs 3-5

I think this could be one paragraph, because all of them are describing what we would like to have in an ideal package. Suggested:

One of the most popular frameworks for interactive analysis is the R programming language, extended for biological data analysis through the Bioconductor project (Huber et al., 2015). While Bioconductor packages have been widely used for bulk RNA-seq data, the existing data structures (like the ExpressionSet class) are not sufficient for scRNA-seq data. This is because they do not support data types that are specific to single-cell studies, e.g., cell-cell distance matrices for clustering. For larger studies, this also includes data beyond expression profiles such as intensity values from fluorescence-activated cell sorting, cell imaging data, and information from epigenetic and targeted genotyping assays. Existing methods for processing and applying quality control to scRNA-seq data are similarly inadequate. In particular, current visualisation methods designed for exploratory data analysis of bulk transcriptomic experiments are unsuited to datasets containing hundreds or thousands of cells. The large size of each dataset also favours methods such as kallisto (Bray et al., 2016) and Salmon (Patro et al., 2015) for rapidly quantifying gene expression. Extensions to the current computational infrastructure is required to provide appropriate data structures and methods that can accommodate these rich scRNA-seq datasets for integrative analyses of expression and other assay data along with the accompanying metadata.

davismcc / scaterPaperExtras

Page 2: Introduction: Paragraphs 3-5 #32