GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
387 stars 138 forks source link

doubScores <- addDoubletScores() #1599

Closed alifarhat40 closed 2 years ago

alifarhat40 commented 2 years ago

This is an issue template made by the developers of ArchR. You MUST follow these instructions.

Questions related to how to use ArchR or requests for new features should be posted in the Discussions forum (https://github.com/GreenleafLab/ArchR/discussions).

Before you submit this Bug Report please update ArchR to the latest stable version and make sure that this issue has not already been fixed in the latest release. ArchR is still in active development and we will fix problems as they arise. To update ArchR:

devtools::install_github("GreenleafLab/ArchR", ref="master", repos = BiocManager::repositories())

If your issue persists, then please submit this bug report.

PLEASE FILL OUT THE RELEVANT INFORMATION AND DELETE THE UNUSED PORTIONS OF THIS ISSUE TEMPLATE.

Attach your log file ArchR has a built-in logging functionality for all complex functions. You MUST attach your log file (indicated in the console output) to this issue. Just drag and drop it here.

Describe the bug A clear and concise description of what the bug is.

To Reproduce To help us optimally address your issue, please try to reproduce this issue using the tutorial hematopoiesis dataset and provide us the command(s) to reproduce your bug. Our first question to you will be "can you reproduce this with the tutorial dataset" so please do this.

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem. Do not screenshot code or text but embed this in markdown using triple-backticks.

Session Info If you do not have a log file because the function that caused the error does not produce one, please paste the output of "sessionInfo()" here.

Additional context Add any other context about the problem here.

rcorces commented 2 years ago

Hi @alifarhat40! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

alifarhat40 commented 2 years ago

Hello, I am just following the tutorial word for word:

doubScores <- addDoubletScores( input = ArrowFiles, k = 10, #Refers to how many cells near a "pseudo-doublet" to count. knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search with doublet projection. LSIMethod = 1 )

ArchR-addDoubletScores-790006da71e4e-Date-2022-08-31_Time-21-00-24.log

But when I use the doubScores function I get an error. Everything was working fine honestly until this. I am just copying and pasting all the cells word for word. { "name": "ERROR", "message": "Error in h(simpleError(msg, call)): error in evaluating the argument 'i' in selecting a method for function '[': Object 'TileMatrix/Info/CellNames' does not exist in this HDF5 file.\n", "stack": "Error in h(simpleError(msg, call)): error in evaluating the argument 'i' in selecting a method for function '[': Object 'TileMatrix/Info/CellNames' does not exist in this HDF5 file.\nTraceback:\n\n1. addDoubletScores(input = ArrowFiles, k = 10, knnMethod = \"UMAP\", \n . LSIMethod = 1)\n2. .batchlapply(args, sequential = TRUE)\n3. do.call(.safelapply, args)\n4. (function (..., threads = 1, preschedule = FALSE) \n . {\n . if (tolower(.Platform$OS.type) == \"windows\") {\n . threads <- 1\n . }\n . if (threads > 1) {\n . .requirePackage(\"parallel\", source = \"cran\")\n . o <- mclapply(..., mc.cores = threads, mc.preschedule = preschedule)\n . errorMsg <- list()\n . for (i in seq_along(o)) {\n . if (inherits(o[[i]], \"try-error\")) {\n . capOut <- utils::capture.output(o[[i]])\n . capOut <- capOut[!grepl(\"attr\\(\\,|try-error\", \n . capOut)]\n . capOut <- head(capOut, 10)\n . capOut <- unlist(lapply(capOut, function(x) substr(x, \n . 1, 250)))\n . capOut <- paste0(\"\t\", capOut)\n . errorMsg[[length(errorMsg) + 1]] <- paste0(c(paste0(\"Error Found Iteration \", \n . i, \" : \"), capOut), \"\n\")\n . }\n . }\n . if (length(errorMsg) != 0) {\n . errorMsg <- unlist(errorMsg)\n . errorMsg <- head(errorMsg, 50)\n . errorMsg[1] <- paste0(\"\n\", errorMsg[1])\n . stop(errorMsg)\n . }\n . }\n . else {\n . o <- lapply(...)\n . }\n . o\n . })(useMatrix = \"TileMatrix\", k = 10, nTrials = 5, dimsToUse = 1:30, \n . LSIMethod = 1, scaleDims = FALSE, corCutOff = 0.75, knnMethod = \"UMAP\", \n . UMAPParams = list(n_neighbors = 40, min_dist = 0.4, metric = \"euclidean\", \n . verbose = FALSE), LSIParams = list(outlierQuantiles = NULL, \n . filterBias = FALSE), outDir = \"QualityControl\", threads = 1, \n . force = FALSE, verbose = TRUE, logFile = \"ArchRLogs/ArchR-addDoubletScores-7910063557767-Date-2022-08-31_Time-21-21-43.log\", \n . ArrowFiles = c(\"scATAC_BMMC_R1.arrow\", \"scATAC_CD34_BMMC_R1.arrow\", \n . \"scATAC_PBMC_R1.arrow\"), X = 1:3, FUN = function (i = NULL, \n . ArrowFiles = NULL, useMatrix = \"TileMatrix\", allCells = NULL, \n . UMAPParams = list(), LSIParams = list(), nTrials = 5, \n . dimsToUse = 1:30, corCutOff = 0.75, LSIMethod = 1, sampleCells = NULL, \n . scaleDims = FALSE, k = 10, nSample = 1000, knnMethod = \"UMAP\", \n . outDir = \"QualityControl\", force = FALSE, subThreads = 1, \n . verbose = TRUE, tstart = NULL, logFile = NULL) \n . {\n . if (is.null(tstart)) {\n . tstart <- Sys.time()\n . }\n . ArrowFile <- ArrowFiles[i]\n . sampleName <- .sampleName(ArrowFile)\n . outDir <- file.path(outDir, sampleName)\n . dir.create(outDir, showWarnings = FALSE)\n . prefix <- sprintf(\"%s (%s of %s) : \", sampleName, i, \n . length(ArrowFiles))\n . .logDiffTime(sprintf(\"%s Computing Doublet Statistics\", \n . prefix), tstart, addHeader = FALSE, verbose = verbose, \n . logFile = logFile)\n . tmpDir <- .tempfile()\n . dir.create(tmpDir)\n . proj <- suppressMessages(ArchRProject(ArrowFiles = ArrowFile, \n . outputDirectory = tmpDir, copyArrows = FALSE, showLogo = FALSE, \n . geneAnnotation = .nullGeneAnnotation(), genomeAnnotation = .nullGenomeAnnotation()))\n . if (is.null(allCells)) {\n . proj@cellColData <- proj@cellColData[.availableCells(ArrowFile, \n . useMatrix), ]\n . }\n . else {\n . proj@cellColData <- proj@cellColData[which(rownames(proj@cellColData) %in% \n . allCells), ]\n . }\n . .logDiffTime(\"Running IterativeLSI\", tstart, addHeader = FALSE, \n . verbose = FALSE, logFile = logFile)\n . LSIParams$ArchRProj <- proj\n . LSIParams$saveIterations <- FALSE\n . LSIParams$useMatrix <- useMatrix\n . LSIParams$LSIMethod <- LSIMethod\n . LSIParams$dimsToUse <- dimsToUse\n . LSIParams$scaleDims <- scaleDims\n . LSIParams$corCutOff <- corCutOff\n . LSIParams$threads <- subThreads\n . LSIParams$verbose <- FALSE\n . LSIParams$force <- TRUE\n . LSIParams$logFile <- logFile\n . proj <- tryCatch({\n . do.call(addIterativeLSI, LSIParams)\n . }, error = function(e) {\n . .logError(e, fn = \"addIterativeLSI\", info = prefix, \n . errorList = list(ArrowFile = ArrowFile), logFile = logFile)\n . })\n . .logDiffTime(\"Constructing Partial Matrix for Projection\", \n . tstart, addHeader = FALSE, verbose = FALSE, logFile = logFile)\n . LSI <- getReducedDims(ArchRProj = proj, reducedDims = \"IterativeLSI\", \n . corCutOff = corCutOff, dimsToUse = dimsToUse, scaleDims = scaleDims, \n . returnMatrix = FALSE)\n . .logThis(LSI, name = paste0(prefix, \"LSI Result\"), logFile = logFile)\n . LSIDims <- seq_len(ncol(LSI[[1]]))\n . if (length(LSIDims) < 2) {\n . .logMessage(\"Reduced LSI Dims below 2 dimensions, please increase dimsToUse or increase corCutOff!\")\n . stop(\"Reduced LSI Dims below 2 dimensions, please increase dimsToUse or increase corCutOff!\")\n . }\n . featureDF <- LSI$LSIFeatures\n . mat <- tryCatch({\n . .getPartialMatrix(ArrowFiles = getArrowFiles(proj), \n . featureDF = featureDF, threads = subThreads, \n . cellNames = rownames(getCellColData(proj)), doSampleCells = FALSE, \n . verbose = FALSE)\n . }, error = function(e) {\n . errorList <- list(ArrowFiles = getArrowFiles(proj), \n . featureDF = featureDF, threads = subThreads, \n . cellNames = rownames(getCellColData(proj)), doSampleCells = FALSE, \n . verbose = FALSE)\n . .logError(e, fn = \"getPartialMatrix\", info = prefix, \n . errorList = errorList, logFile = logFile)\n . })\n . cellNames <- rownames(getCellColData(proj))\n . .logDiffTime(\"Running LSI UMAP\", tstart, addHeader = FALSE, \n . verbose = FALSE, logFile = logFile)\n . set.seed(1)\n . UMAPParams <- .mergeParams(UMAPParams, list(n_neighbors = 40, \n . min_dist = 0.4, metric = \"euclidean\", verbose = FALSE))\n . UMAPParams$X <- LSI$matSVD\n . UMAPParams$ret_nn <- TRUE\n . UMAPParams$ret_model <- TRUE\n . UMAPParams$nthreads <- subThreads\n . .logThis(UMAPParams, name = paste0(prefix, \"UMAP Params\"), \n . logFile = logFile)\n . uwotUmap <- tryCatch({\n . do.call(uwot::umap, UMAPParams)\n . }, error = function(e) {\n . errorList <- UMAPParams\n . .logError(e, fn = \"uwot::umap\", info = prefix, errorList = errorList, \n . logFile = logFile)\n . })\n . .logDiffTime(\"Simulating and Projecting Doublets\", tstart, \n . addHeader = FALSE, verbose = FALSE, logFile = logFile)\n . simDoubletsSave <- tryCatch({\n . .simulateProjectDoublets(mat = mat, LSI = LSI, sampleRatio1 = c(1/2), \n . sampleRatio2 = c(1/2), nTrials = nTrials max(floor(nCells(proj)/nSample), \n . 1), nSample = nSample, k = k, uwotUmap = uwotUmap, \n . seed = 1, force = force, threads = subThreads, \n . logFile = logFile, prefix = prefix)\n . }, error = function(e) {\n . errorList <- list(mat = mat, LSI = LSI, sampleRatio1 = c(1/2), \n . sampleRatio2 = c(1/2), nTrials = nTrials max(floor(nCells(proj)/nSample), \n . 1), nSample = nSample, k = k, uwotUmap = uwotUmap, \n . seed = 1, force = force, threads = subThreads, \n . logFile = logFile, prefix = prefix)\n . .logError(e, fn = \".simulateProjectDoublets\", info = prefix, \n . errorList = errorList, logFile = logFile)\n . })\n . if (tolower(knnMethod) == \"lsi\") {\n . simDoublets <- SimpleList(doubletUMAP = simDoubletsSave$doubletUMAP, \n . doubletScore = simDoubletsSave$doubletScoreLSI, \n . doubletEnrich = simDoubletsSave$doubletEnrichLSI)\n . }\n . else {\n . simDoublets <- SimpleList(doubletUMAP = simDoubletsSave$doubletUMAP, \n . doubletScore = simDoubletsSave$doubletScoreUMAP, \n . doubletEnrich = simDoubletsSave$doubletEnrichUMAP)\n . }\n . .logThis(simDoublets, name = paste0(prefix, \"SimulationResults\"), \n . logFile = logFile)\n . pal <- c(\"grey\", \"#FB8861FF\", \"#B63679FF\", \"#51127CFF\", \n . \"#000004FF\")\n . df <- data.frame(row.names = rownames(LSI$matSVD), uwotUmap[[1]], \n . type = \"experiment\")\n . df[, \"score\"] <- 0\n . df[, \"enrichment\"] <- 0\n . df[names(simDoublets$doubletScore), \"score\"] <- simDoublets$doubletScore\n . df[names(simDoublets$doubletScore), \"enrichment\"] <- simDoublets$doubletEnrich\n . doubUMAP <- simDoublets$doubletUMAP\n . dfDoub <- data.frame(row.names = paste0(\"doublet\", seq_len(nrow(doubUMAP))), \n . .getDensity(doubUMAP[, 1], doubUMAP[, 2]), type = \"simulated_doublet\")\n . dfDoub <- dfDoub[order(dfDoub$density), , drop = FALSE]\n . dfDoub$color <- dfDoub$density\n . .logThis(df, name = paste0(prefix, \"Sample UMAP\"), logFile = logFile)\n . .logThis(dfDoub, name = paste0(prefix, \"Simulated Doublet UMAP\"), \n . logFile = logFile)\n . summaryList <- SimpleList(originalDataUMAP = df, simulatedDoubletUMAP = dfDoub, \n . doubletResults = simDoubletsSave)\n . .safeSaveRDS(summaryList, file.path(outDir, paste0(.sampleName(ArrowFile), \n . \"-Doublet-Summary.rds\")))\n . rm(simDoubletsSave)\n . tmpFile <- .tempfile()\n . o <- tryCatch({\n . pdf(file.path(outDir, paste0(.sampleName(ArrowFile), \n . \"-Doublet-Summary.pdf\")), width = 6, height = 6)\n . xlim <- range(df$X1) %>% extendrange(f = 0.05)\n . ylim <- range(df$X2) %>% extendrange(f = 0.05)\n . pdensity <- ggplot() + .geom_point_rast2(data = df, \n . aes(x = X1, y = X2), color = \"lightgrey\", size = 0.5) + \n . .geom_point_rast2(data = dfDoub, aes(x = x, y = y, \n . colour = color), size = 0.5) + scale_colour_gradientn(colors = pal) + \n . xlab(\"UMAP Dimension 1\") + ylab(\"UMAP Dimension 2\") + \n . labs(color = \"Simulated Doublet Density\") + guides(fill = \"none\") + \n . theme_ArchR(baseSize = 10) + theme(axis.text.x = element_blank(), \n . axis.ticks.x = element_blank(), axis.text.y = element_blank(), \n . axis.ticks.y = element_blank()) + coord_equal(ratio = diff(xlim)/diff(ylim), \n . xlim = xlim, ylim = ylim, expand = FALSE) + ggtitle(\"Simulated and LSI-Projected Density Overlayed\") + \n . theme(legend.direction = \"horizontal\", legend.box.background = element_rect(color = NA))\n . pscore <- ggPoint(x = df[, 1], y = df[, 2], color = .quantileCut(df$score, \n . 0, 0.95), xlim = xlim, ylim = ylim, discrete = FALSE, \n . size = 0.5, xlab = \"UMAP Dimension 1\", ylab = \"UMAP Dimension 2\", \n . pal = pal, title = \"Doublet Scores -log10(P-adj.)\", \n . colorTitle = \"Doublet Scores -log10(P-adj.)\", \n . rastr = TRUE, baseSize = 10) + theme(axis.text.x = element_blank(), \n . axis.ticks.x = element_blank(), axis.text.y = element_blank(), \n . axis.ticks.y = element_blank())\n . penrich <- ggPoint(x = df[, 1], y = df[, 2], color = .quantileCut(df$enrichment, \n . 0, 0.95), xlim = xlim, ylim = ylim, discrete = FALSE, \n . size = 0.5, xlab = \"UMAP Dimension 1\", ylab = \"UMAP Dimension 2\", \n . pal = pal, title = \"Simulated Doublet Enrichment over Expectation\", \n . colorTitle = \"Doublet Enrichment\", rastr = TRUE, \n . baseSize = 10) + theme(axis.text.x = element_blank(), \n . axis.ticks.x = element_blank(), axis.text.y = element_blank(), \n . axis.ticks.y = element_blank())\n . .fixPlotSize(penrich, plotWidth = 6, plotHeight = 6)\n . grid::grid.newpage()\n . .fixPlotSize(pscore, plotWidth = 6, plotHeight = 6)\n . grid::grid.newpage()\n . .fixPlotSize(pdensity, plotWidth = 6, plotHeight = 6)\n . dev.off()\n . }, error = function(e) {\n . errorList <- list(df = df, dfDoub = dfDoub)\n . .logError(e, fn = \"ggplot\", info = prefix, errorList = errorList, \n . logFile = logFile, throwError = FALSE)\n . })\n . allCells <- .availableCells(ArrowFile, passQC = FALSE)\n . allDoubletScores <- rep(-1, length(allCells))\n . names(allDoubletScores) <- allCells\n . allDoubletScores[names(simDoublets$doubletScore)] <- simDoublets$doubletScore\n . allDoubletEnrichment <- rep(-1, length(allCells))\n . names(allDoubletEnrichment) <- allCells\n . allDoubletEnrichment[names(simDoublets$doubletEnrich)] <- simDoublets$doubletEnrich\n . o <- h5closeAll()\n . h5write(allDoubletScores, file = ArrowFile, \"Metadata/DoubletScore\")\n . h5write(allDoubletEnrichment, file = ArrowFile, \"Metadata/DoubletEnrichment\")\n . o <- h5closeAll()\n . out <- SimpleList(doubletScore = simDoublets$doubletScore, \n . doubletEnrich = simDoublets$doubletEnrich)\n . return(out)\n . }, tstart = structure(1661980903.29408, class = c(\"POSIXct\", \n . \"POSIXt\")), subThreads = 16L)\n5. lapply(...)\n6. FUN(X[[i]], ...)\n7. proj@cellColData[.availableCells(ArrowFile, useMatrix), ]\n8. .availableCells(ArrowFile, useMatrix)\n9. h5read(ArrowFile, paste0(subGroup, \"/Info/CellNames\"))\n10. stop(\"Object '\", name, \"' does not exist in this HDF5 file.\")\n11. .handleSimpleError(function (cond) \n . .Internal(C_tryCatchHelper(addr, 1L, cond)), \"Object 'TileMatrix/Info/CellNames' does not exist in this HDF5 file.\", \n . base::quote(h5read(ArrowFile, paste0(subGroup, \"/Info/CellNames\"))))\n12. h(simpleError(msg, call))" }

rcorces commented 2 years ago

something is wrong with your arrow files. I would delete everything and start from scratch.

alifarhat40 commented 2 years ago

solved. Thanks. The issue was here: ArrowFiles <- createArrowFiles( inputFiles = inputFiles, sampleNames = names(inputFiles), filterTSS = 4, #Dont set this too high because you can always increase later filterFrags = 1000, addTileMat = TRUE, addGeneScoreMat = TRUE )

if you use FALSE then the error arrives. Not sure why. But I changed it back to TRUE and it works now. addTileMat = FALSE, addGeneScoreMat = FALSE