Different group labels for the same predicted cell in integrated scRNA #1351

Open Zepeng-Mu opened 2 years ago

Zepeng-Mu commented 2 years ago

Hi, I have noticed strange behavior when adding GeneIntegrationMatrix. I have posted a Discussion several months ago but I think it might be useful to open an Issue as well.

Basically, when integrating scRNA into scATAC, sometime scRNA barcode is assigned to more than one scATAC cell, in this case I have noticed that the nameGroup from integration for the same cell can have DIFFERENT labels.

The Discussion post I created before that gave a detailed example is here: I'm using release_1.0.2.


rcorces commented 2 years ago

Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now.

Zepeng-Mu commented 2 years ago

Hi @rcorces, I can confirm this behavior is reproducible using tutorial data.


inputFiles <- getTutorialData("Hematopoiesis")

addArchRThreads(threads = 14) 

ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  minTSS = 4, #Dont set this too high because you can always increase later
  minFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE

doubScores <- addDoubletScores(
  input = ArrowFiles,
  k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
  knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search with doublet projection.
  LSIMethod = 1

projHeme1 <- ArchRProject(
  ArrowFiles = ArrowFiles, 
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE #This is recommened so that if you modify the Arrow files you have an original copy for later usage.

projHeme2 <- filterDoublets(projHeme1)

projHeme2 <- addIterativeLSI(
  ArchRProj = projHeme2,
  useMatrix = "TileMatrix", 
  name = "IterativeLSI", 
  iterations = 2, 
  clusterParams = list( #See Seurat::FindClusters
    resolution = c(0.2), 
    sampleCells = 10000, 
    n.start = 10
  varFeatures = 25000, 
  dimsToUse = 1:30

projHeme2 <- addHarmony(
  ArchRProj = projHeme2,
  reducedDims = "IterativeLSI",
  name = "Harmony",
  groupBy = "Sample"

seRNA <- readRDS("scRNA-Hematopoiesis-Granja-2019.rds")

projHeme2 <- addGeneIntegrationMatrix(
  ArchRProj = projHeme2, 
  useMatrix = "GeneScoreMatrix",
  matrixName = "GeneIntegrationMatrix",
  reducedDims = "IterativeLSI",
  seRNA = seRNA,
  addToArrow = FALSE,
  groupRNA = "BioClassification",
  nameCell = "predictedCell_Un",
  nameGroup = "predictedGroup_Un",
  nameScore = "predictedScore_Un"

and then we can check:

sort(table(projHeme2$predictedCell_Un), decreasing=T)[1]



So this cell is mapped to 615 cells in scATAC dataset.

seRNA$BioClassification[colnames(seRNA) == "CD34_32_R5:CGTAGCGAGTTCGCGC-1"]
[1] "02_Early.Eryth"

and in seRNA this is annotated as "02_Early.Eryth". However,

> table(projHeme2$predictedGroup_Un[projHeme2$predictedCell_Un == "CD34_32_R5:CGTAGCGAGTTCGCGC-1"])

        01_HSC 02_Early.Eryth  03_Late.Eryth    08_GMP.Neut 
           354                       255                       1                          5

Basically for scATAC cells with predictedCell_Un "CD34_32_R5:CGTAGCGAGTTCGCGC-1", the corresponding predictedGroup_Un has four different annotated groups.

Here is the log:

Zepeng-Mu commented 2 years ago

Hi, I'm wondering whether this is because of some stochasticity in Seurat TransferData and it is run twice to get cellGroup and cellName?

rcorces commented 2 years ago

That seems like a likely culprit. Thanks for pointing that out. We will take a look. Sorry its taking some time but we will get to this eventually.