chris-mcginnis-ucsf / DoubletFinder

R package for detecting doublets in single-cell RNA sequencing data
414 stars 109 forks source link

compatibility with seurat version 5: paramSweep_v3 #161

Closed aCompanionUnobtrusive closed 11 months ago

aCompanionUnobtrusive commented 1 year ago

Hello,

I recently upgraded to seurat version 5, and now one of the DoubletFinder function is no longer working.

With earlier Seurat, I ran the following code without any issues:

sweep.res.list <- paramSweep_v3(seurobj, PCs = 1:10, sct = FALSE)

But now with seurat version 5 I have the following error:

Error in paramSweep_v3(seurobj, PCs = 1:dims$expDims_DF, sct = FALSE) :
  no slot of name "counts" for this object of class "Assay5"

my package version is:

packageVersion('DoubletFinder')
[1] ‘2.0.3’

Has anyone found workarounds for this issue, and are there plans to update DoubletFinder to work with Seurat 5?

Thanks

mmarchin commented 1 year ago

I just ran into this issue too, and it seems like if you change the @counts in the function to $counts instead, it works. You also have to do this for the doubletFinder_v3 function.

paramSweep_v3_Seurat5<-function (seu, PCs = 1:10, sct = FALSE, num.cores = 1) 
{
    require(Seurat)
    require(fields)
    pK <- c(5e-04, 0.001, 0.005, seq(0.01, 0.3, by = 0.01))
    pN <- seq(0.05, 0.3, by = 0.05)
    min.cells <- round(nrow(seu@meta.data)/(1 - 0.05) - nrow(seu@meta.data))
    pK.test <- round(pK * min.cells)
    pK <- pK[which(pK.test >= 1)]
    orig.commands <- seu@commands
    if (nrow(seu@meta.data) > 10000) {
        real.cells <- rownames(seu@meta.data)[sample(1:nrow(seu@meta.data), 
            10000, replace = FALSE)]
        data <- seu@assays$RNA$counts[, real.cells]
        n.real.cells <- ncol(data)
    }
    if (nrow(seu@meta.data) <= 10000) {
        real.cells <- rownames(seu@meta.data)
        data <- seu@assays$RNA$counts
        n.real.cells <- ncol(data)
    }
    if (num.cores > 1) {
        require(parallel)
        cl <- makeCluster(num.cores)
        output2 <- mclapply(as.list(1:length(pN)), FUN = parallel_paramSweep_v3, 
            n.real.cells, real.cells, pK, pN, data, orig.commands, 
            PCs, sct, mc.cores = num.cores)
        stopCluster(cl)
    }
    else {
        output2 <- lapply(as.list(1:length(pN)), FUN = parallel_paramSweep_v3, 
            n.real.cells, real.cells, pK, pN, data, orig.commands, 
            PCs, sct)
    }
    sweep.res.list <- list()
    list.ind <- 0
    for (i in 1:length(output2)) {
        for (j in 1:length(output2[[i]])) {
            list.ind <- list.ind + 1
            sweep.res.list[[list.ind]] <- output2[[i]][[j]]
        }
    }
    name.vec <- NULL
    for (j in 1:length(pN)) {
        name.vec <- c(name.vec, paste("pN", pN[j], "pK", pK, 
            sep = "_"))
    }
    names(sweep.res.list) <- name.vec
    return(sweep.res.list)
}
gangcai commented 1 year ago

Furthermore, doubletFinder_v3 is also need to be changed accordingly:

doubletFinder_v3_SeuratV5 <- function (seu, PCs, pN = 0.25, pK, nExp, reuse.pANN = FALSE, 
          sct = FALSE, annotations = NULL) 
{
  require(Seurat)
  require(fields)
  require(KernSmooth)
  if (reuse.pANN != FALSE) {
    pANN.old <- seu@meta.data[, reuse.pANN]
    classifications <- rep("Singlet", length(pANN.old))
    classifications[order(pANN.old, decreasing = TRUE)[1:nExp]] <- "Doublet"
    seu@meta.data[, paste("DF.classifications", pN, pK, nExp, 
                          sep = "_")] <- classifications
    return(seu)
  }
  if (reuse.pANN == FALSE) {
    real.cells <- rownames(seu@meta.data)
    data <- seu@assays$RNA$counts[, real.cells]
    n_real.cells <- length(real.cells)
    n_doublets <- round(n_real.cells/(1 - pN) - n_real.cells)
    print(paste("Creating", n_doublets, "artificial doublets...", 
                sep = " "))
    real.cells1 <- sample(real.cells, n_doublets, replace = TRUE)
    real.cells2 <- sample(real.cells, n_doublets, replace = TRUE)
    doublets <- (data[, real.cells1] + data[, real.cells2])/2
    colnames(doublets) <- paste("X", 1:n_doublets, sep = "")
    data_wdoublets <- cbind(data, doublets)
    if (!is.null(annotations)) {
      stopifnot(typeof(annotations) == "character")
      stopifnot(length(annotations) == length(Cells(seu)))
      stopifnot(!any(is.na(annotations)))
      annotations <- factor(annotations)
      names(annotations) <- Cells(seu)
      doublet_types1 <- annotations[real.cells1]
      doublet_types2 <- annotations[real.cells2]
    }
    orig.commands <- seu@commands
    if (sct == FALSE) {
      print("Creating Seurat object...")
      seu_wdoublets <- CreateSeuratObject(counts = data_wdoublets)
      print("Normalizing Seurat object...")
      seu_wdoublets <- NormalizeData(seu_wdoublets, normalization.method = orig.commands$NormalizeData.RNA@params$normalization.method, 
                                     scale.factor = orig.commands$NormalizeData.RNA@params$scale.factor, 
                                     margin = orig.commands$NormalizeData.RNA@params$margin)
      print("Finding variable genes...")
      seu_wdoublets <- FindVariableFeatures(seu_wdoublets, 
                                            selection.method = orig.commands$FindVariableFeatures.RNA$selection.method, 
                                            loess.span = orig.commands$FindVariableFeatures.RNA$loess.span, 
                                            clip.max = orig.commands$FindVariableFeatures.RNA$clip.max, 
                                            mean.function = orig.commands$FindVariableFeatures.RNA$mean.function, 
                                            dispersion.function = orig.commands$FindVariableFeatures.RNA$dispersion.function, 
                                            num.bin = orig.commands$FindVariableFeatures.RNA$num.bin, 
                                            binning.method = orig.commands$FindVariableFeatures.RNA$binning.method, 
                                            nfeatures = orig.commands$FindVariableFeatures.RNA$nfeatures, 
                                            mean.cutoff = orig.commands$FindVariableFeatures.RNA$mean.cutoff, 
                                            dispersion.cutoff = orig.commands$FindVariableFeatures.RNA$dispersion.cutoff)
      print("Scaling data...")
      seu_wdoublets <- ScaleData(seu_wdoublets, features = orig.commands$ScaleData.RNA$features, 
                                 model.use = orig.commands$ScaleData.RNA$model.use, 
                                 do.scale = orig.commands$ScaleData.RNA$do.scale, 
                                 do.center = orig.commands$ScaleData.RNA$do.center, 
                                 scale.max = orig.commands$ScaleData.RNA$scale.max, 
                                 block.size = orig.commands$ScaleData.RNA$block.size, 
                                 min.cells.to.block = orig.commands$ScaleData.RNA$min.cells.to.block)
      print("Running PCA...")
      seu_wdoublets <- RunPCA(seu_wdoublets, features = orig.commands$ScaleData.RNA$features, 
                              npcs = length(PCs), rev.pca = orig.commands$RunPCA.RNA$rev.pca, 
                              weight.by.var = orig.commands$RunPCA.RNA$weight.by.var, 
                              verbose = FALSE)
      pca.coord <- seu_wdoublets@reductions$pca@cell.embeddings[, 
                                                                PCs]
      cell.names <- rownames(seu_wdoublets@meta.data)
      nCells <- length(cell.names)
      rm(seu_wdoublets)
      gc()
    }
    if (sct == TRUE) {
      require(sctransform)
      print("Creating Seurat object...")
      seu_wdoublets <- CreateSeuratObject(counts = data_wdoublets)
      print("Running SCTransform...")
      seu_wdoublets <- SCTransform(seu_wdoublets)
      print("Running PCA...")
      seu_wdoublets <- RunPCA(seu_wdoublets, npcs = length(PCs))
      pca.coord <- seu_wdoublets@reductions$pca@cell.embeddings[, 
                                                                PCs]
      cell.names <- rownames(seu_wdoublets@meta.data)
      nCells <- length(cell.names)
      rm(seu_wdoublets)
      gc()
    }
    print("Calculating PC distance matrix...")
    dist.mat <- fields::rdist(pca.coord)
    print("Computing pANN...")
    pANN <- as.data.frame(matrix(0L, nrow = n_real.cells, 
                                 ncol = 1))
    if (!is.null(annotations)) {
      neighbor_types <- as.data.frame(matrix(0L, nrow = n_real.cells, 
                                             ncol = length(levels(doublet_types1))))
    }
    rownames(pANN) <- real.cells
    colnames(pANN) <- "pANN"
    k <- round(nCells * pK)
    for (i in 1:n_real.cells) {
      neighbors <- order(dist.mat[, i])
      neighbors <- neighbors[2:(k + 1)]
      pANN$pANN[i] <- length(which(neighbors > n_real.cells))/k
      if (!is.null(annotations)) {
        for (ct in unique(annotations)) {
          neighbor_types[i, ] <- table(doublet_types1[neighbors - 
                                                        n_real.cells]) + table(doublet_types2[neighbors - 
                                                                                                n_real.cells])
          neighbor_types[i, ] <- neighbor_types[i, ]/sum(neighbor_types[i, 
          ])
        }
      }
    }
    print("Classifying doublets..")
    classifications <- rep("Singlet", n_real.cells)
    classifications[order(pANN$pANN[1:n_real.cells], decreasing = TRUE)[1:nExp]] <- "Doublet"
    seu@meta.data[, paste("pANN", pN, pK, nExp, sep = "_")] <- pANN[rownames(seu@meta.data), 
                                                                    1]
    seu@meta.data[, paste("DF.classifications", pN, pK, nExp, 
                          sep = "_")] <- classifications
    if (!is.null(annotations)) {
      colnames(neighbor_types) = levels(doublet_types1)
      for (ct in levels(doublet_types1)) {
        seu@meta.data[, paste("DF.doublet.contributors", 
                              pN, pK, nExp, ct, sep = "_")] <- neighbor_types[, 
                                                                              ct]
      }
    }
    return(seu)
  }
}
chris-mcginnis-ucsf commented 1 year ago

Hi @aCompanionUnobtrusive @mmarchin and @gangcai -- I'm looking to update DF this weekend. Thanks for posting the code -- leaving this open so others can find until I actually update things.

Lucyyang1991 commented 1 year ago

@mmarchin Thanks so much for your solution. I use this solution, and work well in seurat v5, but when I use it in seurat5+bpcells, I get this error:

Error: In function [: "j" must not have duplicated values
▆
1. ├─BiocGenerics::lapply(...)
2. └─base::lapply(...)
3. └─global FUN(X[[i]], ...)
4. └─global paramSweep_v3_Seurat5(seu = i, PCs = 1:30, sct = F)
5. ├─BiocGenerics::lapply(...)
6. └─base::lapply(...)
7. └─DoubletFinder (local) FUN(X[[i]], ...)
8. ├─data[, real.cells1]
9. └─data[, real.cells1]
10. └─BPCells:::selection_index(j, ncol(x), colnames(x))

Do you have any ideas of this error?

Lucyyang1991 commented 1 year ago

@chris-mcginnis-ucsf Thanks so much for your package, will you update this package about seurat5+bpcells workflow?

frac2738 commented 1 year ago

If you are using seurat5 + BPCells, a possible workaround is:

# set seurat to v3
options(Seurat.object.assay.version` = "v3")
# run DoubletFinder like always
seu_kidney <- CreateSeuratObject(kidney.data)
seu_kidney <- NormalizeData(seu_kidney)
seu_kidney <- FindVariableFeatures(seu_kidney, selection.method = "vst", nfeatures = 2000)
seu_kidney <- ScaleData(seu_kidney)
seu_kidney <- RunPCA(seu_kidney)
seu_kidney <- RunUMAP(seu_kidney, dims = 1:10)
sweep.res.list_kidney <- paramSweep_v3(seu_kidney, PCs = 1:10, sct = FALSE)
...

# save singlet matrix
raw_st <- doublets@assays$RNA@counts

# set seurat  v5
options(Seurat.object.assay.version = "v5")

# convert matrix to `uint32_t` (default behaviour of  `open_matrix_10x_hdf5`)
raw_feature_matrix <- convert_matrix_type(raw_st, type = "uint32_t")

# write matrix to disk and load it
write_matrix_dir(raw_feature_matrix, dir = paste0(bpcells_dir,"/",sample_name))
raw_feature_matrix <- open_matrix_dir(dir = paste0(bpcells_dir,"/",sample_name))

# continue with seurat
object <- CreateSeuratObject(raw_feature_matrix)

object is now a seurat 5 object containing only the singlets from DoubletFinder

BenxiaHu commented 1 year ago

Hi @aCompanionUnobtrusive @mmarchin and @gangcai -- I'm looking to update DF this weekend. Thanks for posting the code -- leaving this open so others can find until I actually update things.

Have you update/fixed these errors?

FerrenaAlexander commented 12 months ago

Hi, just ran into this issue too. I guess now Seurat installs version 5 by default from CRAN rather than 4, so many more people and packages will likely run into these kinds of issues. An update would be amazing. Thanks, @chris-mcginnis-ucsf for all your hard work on this great package!

hanhyebin commented 12 months ago

Hi just wanted to bring this issue back to attention. Thanks @chris-mcginnis-ucsf !!!

aCompanionUnobtrusive commented 11 months ago

Not an elegant work around, but what I did was create an older version seurat object to run DF, and then exported the metadata of the barcodes and their doublet classification, and then added this data with Seurat::AddMetadata() to a brand new version 5 seurat object, and subset accordingly.

chris-mcginnis-ucsf commented 11 months ago

I just fixed it -- should work with Seurat V5 now.

Chris