Closed jashapiro closed 1 year ago
The table that I mentioned in our meeting that lists publicly available datasets is from an integration review. It looks like the table links to the papers that published each of the datasets so maybe not as useful as I originally thought but here it is just in case: https://www.nature.com/articles/s41587-021-00895-7/tables/2
I also want to point out that of the ones that say CITE-seq is included (if we want to use the same dataset here and for #564), the only one that is both RNA-seq and CITE-seq (rather than ATAC and CITE) takes you to the weighted nearest neighbors paper, which is an option.
Alternatively, we could use the publicly available datasets on 10X. They don't have cell type associated with them, but they do have a few datasets that have a panel of CITE-seq antibodies? Here's one example: https://www.10xgenomics.com/resources/datasets/10-k-pbm-cs-from-a-healthy-donor-gene-expression-and-cell-surface-protein-3-standard-3-0-0
@jashapiro
For simplicity, it seems logical that we might use the same dataset for the following module, where we demonstrate methods ~for dataset selection.~
---> for cell-type annotation, right? :) We're not planning to show them how to navigate databases is where my brain first went before I realized this is probably a typo!
---> for cell-type annotation
Yes, correct! Updated the text.
Noting that from https://www.nature.com/articles/s41587-021-00895-7/tables/2, there are two references with both RNA and surface proteins:
Noting from 10x -
EDIT: This is 10000% because I forgot to remove total==0
cells first! Filtering is much friendlier to my computer now, as expected :).
I wonder if we might consider using a dataset (already SCE objects) in the scRNAseq
package https://bioconductor.org/packages/release/data/experiment/vignettes/scRNAseq/inst/doc/scRNAseq.html#available-data-sets.
A couple of these datasets, based on some ctl+F'ing of the manual, have CITEseq and can be accessed with these functions. The first two seem like better places to start looking.
KotliarovPBMCData()
Kotliarov, Y., R. Sparks, A. Martins, M. Mulè, Y. Lu, M. Goswami, L. Kardava, et al. 2020. “Broad Immune Activation Underlies Shared Set Point Signatures for Vaccine Responsiveness in Healthy Individuals and Disease Activity in Patients with Lupus.” Nat. Med. 26 (4): 618–29.
MairPBMCData()
Mair, F., J. R. Erickson, V. Voillet, Y. Simoni, T. Bi, A. J. Tyznik, J. Martin, R. Gottardo, E. W. Newell, and M. Prlic. 2020. “A Targeted Multi-omic Analysis Approach Measures Protein Expression and Low-Abundance Transcripts on the Single-Cell Level.” Cell Rep 31 (1): 107499.
StoeckiusHashingData
(beware, hashing, and only "mostly human")
Stoeckius, M., S. Zheng, B. Houck-Loomis, S. Hao, B. Z. Yeung, W. M. Mauck, P. Smibert, and R. Satija. 2018. “Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics.” Genome Biol. 19 (1): 224.
The introductory module is intended to review importing preprocessed data, followed by filtering and normalization. For simplicity, it seems logical that we might use the same dataset for the following module, where we demonstrate methods for celltype assignment. To prepare for implementing this modules, we will first need to select a dataset to use.
The main dataset used in this module should probably be the output of Cell Ranger (unfiltered) from a public data set. We could use an alevin-fry output, but I expect that Cell Ranger would be more broadly useful. We can then start with the
DropletUtils::read10xCounts()
function.To complete this issue, we should create a notebook and/or scripts in the
scRNA-seq-advanced/setup
directory that includes the following steps for the chosen dataset:SingleCellExperiment
objectemptyDropsCellRanger
andmiQC
Following the review of this notebook and the selection of the dataset, we will separate out the various steps, adding detail and commentary for instruction that this initial notebook does not need to include.
Note: We may want to break up this issue into sub-issues. For tracking purposes (as this issue blocks others), we may want to keep this as a meta-issue and make any sub-issues blockers for it.