SingleR-inc / SingleR

Clone of the Bioconductor repository for the SingleR package.
https://bioconductor.org/packages/devel/bioc/html/SingleR.html
GNU General Public License v3.0
178 stars 19 forks source link

markger gene list for annotating neuronal cells #201

Closed dahun73 closed 2 years ago

dahun73 commented 3 years ago

Hello!

I just have a question for annotation of "neuron". I've known that singleR uses various reference gene expression data to annotate accurately. In my data, most of cells were annotated in "Neurons". However, I couldn't find marker gene list for neuron.

I used "MouseRNAseqData()" function for reference.

Also, I want to annotate more detailed for neuron subtype such as excitatory and inhibitory neurons. Is there any good reference to annotate?

Always thank you for using nice tool!

Dahun

friedue commented 3 years ago

However, I couldn't find marker gene list for neuron.

As described in the SingleR book: "The marker genes used for each label are reported in the metadata() of the SingleR() output." It's essentially this line of code:

all.markers <- metadata(predicted_labels)$de.genes

I want to annotate more detailed for neuron subtype such as excitatory and inhibitory neurons. Is there any good reference to annotate?

I've used the Allan Brain Atlas data set in the past, you can check it out and see if your neurons of interest are part of it. The corresponding publication is Yao et al., 2021, I believe.

Here's some example code of how to turn the data that you can download here into a SingleCellExperiment object, which you can then use with SingleR using, for example, the subclass_label accessor. (SingleR details with scRNA-seq as a reference are part of the documentation).

library(data.table)
library(magrittr)
library(SingleCellExperiment)
library(Matrix)

gex <- fread("AllanBrainAtlas/SmartSeq/matrix.csv", sep = ",")
md <- fread("AllanBrainAtlas/SmartSeq/metadata.csv", sep = ",") 

#"sample_name","donor_sex_label", "region_label", "subclass_label", "class_label"
#"class_color","neighborhood_label","neighborhood_color","outlier_call"

gex.mat <- as.matrix(gex[, -1])
rownames(gex.mat) <- gex$sample_name

# turn into sparse matrix
gex.mat <- as(gex.mat, "sparseMatrix")
gex.mat <- t(gex.mat)

cd <- DataFrame(md)
rownames(cd) <- cd$sample_name
cd$sample_name <- NULL
cd <- cd[colnames(gex.mat),]

sceall <- SingleCellExperiment(assays = list(counts  = gex.mat),
    colData = cd)

table(sceall$class_label)
#>GABAergic Glutamatergic  Non-Neuronal 
#>   22745         50002          1958 

table(sceall$subclass_label)

# Astro      CA1-ProS           CA2           CA3 
# 268           976          1701            21           315 
# Car3            CR        CT SUB            DG          Endo 
# 1980            32           173          2469           213 
# L2 IT ENTl     L2 IT RHP L2/3 IT CTX-1 L2/3 IT CTX-2  L2/3 IT ENTl 
# 179           375          5959           106           253 
# L2/3 IT PPP     L3 IT ENT    L3 RSP-ACA   L4/5 IT CTX     L5 IT CTX 
# 1395           577           200         11522          2934 
# L5 IT TPE-ENT     L5 NP CTX        L5 PPP     L5 PT CTX     L6 CT CTX 
# 338          2363            47          1974          6210 
# L6 IT CTX    L6 IT ENTl       L6b CTX    L6b/CT ENT         Lamp5 
# 5015            83          2213           693          4755 
# Meis2     Micro-PVM        NP PPP        NP SUB         Oligo 
# 172           176           150           257           236 
# Pvalb      SMC-Peri          Sncg           Sst     Sst Chodl 
# 4365           198          1491          5258           268 
# SUB-ProS           V3d           Vip          VLMC 
# 467             1          6436           159 

If that's not suitable for your use-case, just turn to google.scholar to search for bulk or single-cell RNA-seq data sets you could use.