carmonalab / ProjecTILs

Interpretation of cell states using reference single-cell maps
GNU General Public License v3.0
231 stars 27 forks source link

Unable to classify half of my CD8+ T cells using custom-built reference #65

Closed france-hub closed 8 months ago

france-hub commented 8 months ago

Hello,

Thank you for this package. I am trying to classify the CD8+ T cell states of my dataset using a custom-built reference from https://www.science.org/doi/10.1126/science.abe6474?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed

Following, CD8 is my query (already normalized); CD8sc is the CD8 seurat object from the paper

CD8sc <- readRDS("./CD8.thisStudy_10X.seu.rds") #load in Science CD8 seurat object

#Use Stacas to batch correct and integrate
CD8sc <- CD8sc |>
  NormalizeData(verbose = FALSE) |>
  FindVariableFeatures(nfeatures = 2e3, selection.method = "vst") |>
  ScaleData() |>
  SplitObject(split.by = "batchV") |>
  Run.STACAS(dims = 1:20, cell.labels = "meta.cluster", anchor.features = 2000) |>
  RunUMAP(dims = 1:20)

#make reference
ref <- make.reference(ref = CD8sc, ndim = 20, seed = 1234,
                      annotation.column = "meta.cluster")

DefaultAssay(ref) <- "RNA"
DefaultAssay(CD8) <- "RNA" 

query.projected <- make.projection(query = CD8, ref = ref, skip.normalize = TRUE)

At this point if I use plot.projection, all the projections make sense. However, when I run

query.projected <- cellstate.predict(ref = ref, query = query.projected)
table(query.projected$functional.cluster, useNA = "ifany")

I get 12,607 NAs. Now, considering that my dataset is ~24k cells big, it means that half of my cells are not classified (right?).

If I use the built-in reference for CD8 ("CD8T_human_ref_v1.rds"), I get way less NA values. Therefore I was wondering whether I am doing something wrong when building the reference atlas.

Thanks! Francesco

mass-a commented 8 months ago

Hey Francesco, great that you could make a custom reference!

In the latest version of ProjecTILs, cells that cannot be confidently assigned to a reference cell type are assigned to NA. The threshold for "confidently" assigning a cell type is controlled by the min.confidence parameter (default 0.5) in the function cellstate.predict. I suggest you lower this parameter; perhaps you can try to rerun your analysis with min.confidence=0 and then inspect the distribution of confidence scores, e.g. hist(query.projected$functional.cluster.conf) and decide accordingly on an acceptable fraction of unassigned cells.

As for why this custom reference has a lower fraction of confident assignments compared to the default reference, I can speculate the reason is that in Zheng et al. they defined many small subtypes, so it's harder to unequivocally assign a cell to one of them. A lower threshold on the confidence score may be appropriate. By the way, we also have a reference map for human CD8 T cells, you can see an example of its application in this case study.

Best -m

france-hub commented 8 months ago

Thanks! I'll follow your suggestions

Francesco