GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

CNV Analysis #312

Closed hugokitano closed 3 years ago

hugokitano commented 4 years ago

Hi, I'm an ArchR user that was directed to this repository https://github.com/GreenleafLab/10x-scATAC-2019 to perform CNV Analysis since the feature is not yet officially available in ArchR.

I posted an issue there, but realized since that project is dormant it may be wiser to post it here. First, is CNV analysis available on ArchR developmentally?

If not, I have a few questions about how this repository works. I see that the work is transferred from step to step via RDS files, so since CNV analysis is the 8th step, I will have to do some number of steps before. Which are necessary?

Another question I have is how to input multiple fragment files. I noticed that a list of vector paths, like in ArchR, doesn't work on the first step, though a single file path does. Is there a way to do this with multiple files?

Thanks!

Hugo Kitano

rcorces commented 4 years ago

CNV analysis is not currently available in ArchR. You may get a reply here about your other questions but no guarantees. I'll leave this open for now.

jgranja24 commented 3 years ago

Hi @hugokitano,

Unfortunately I do not think this will be a feature anytime soon. I have code that will work similar to that paper. However, we have not extensively validated this approach with real test samples. Thus, I am hesitant to add this to ArchR at this time.

Function(s) to run ---

#Run This to accesss hidden functions
fn <- unclass(lsf.str(envir = asNamespace("ArchR"), all = TRUE))
for (i in seq_along(fn)) {
    tryCatch({
        eval(parse(text = paste0(fn[i], "<-ArchR:::", fn[i])))
    }, error = function(x) {
    })
}

####################################################################
# Copy Number Variation Methods
####################################################################

#' Add a CNV matrix to ArrowFiles or an ArchRProject
#' 
#' This function for each sample will predict copy number variation from accessibility
#'
#' @param input An `ArchRProject` object or character vector of ArrowFiles.
#' @param chromSizes A named numeric vector containing the chromsome names and lengths. The default behavior is to retrieve this from the `ArchRProject` using `getChromSizes()`.
#' @param blacklist A `GRanges` object containing genomic regions to blacklist from calling CNVs. The default behavior is to retrieve this from the `ArchRProject` using `getBlacklist()`.
#' @param genome The name of the genome used by the `input`. The default behavior is to retrieve this from the `ArchRProject` using `getGenome()`.
#' @param windowSize The size in basepairs for the sliding window used to break up each chromosome to look for CNVs.
#' @param stepSize The size in basepairs for the step used to create sliding window bins across each chromosome.
#' @param excludeChr A character vector containing the `seqnames` of the chromosomes that should be excluded from CNV analysis.
#' @param threads The number of threads to be used for parallel computing.
#' @param parallelParam A list of parameters to be passed for biocparallel/batchtools parallel computing.
#' @param force A boolean value indicating whether to force the CNV matrix to be overwritten if it already exists for `input`.
addCNVMatrix <- function(
  input = NULL,
  chromSizes = getChromSizes(input),
  blacklist = getBlacklist(input),
  genome = getGenome(input),
  windowSize = 10e6, 
  stepSize = 2e6,
  normByNeighbors = FALSE,
  excludeChr = c("chrM","chrY"),
  threads = getArchRThreads(),
  parallelParam = NULL,
  force = FALSE,
  logFile = createLogFile("addCNVMatrix")
  ){

  .validInput(input = input, name = "input", valid = c("character", "ArchRProj"))
  .validInput(input = chromSizes, name = "chromSizes", valid = c("GRanges"))
  .validInput(input = blacklist, name = "blacklist", valid = c("GRanges", "null"))
  .validInput(input = genome, name = "genome", valid = c("character"))
  .validInput(input = windowSize, name = "windowSize", valid = c("integer"))
  .validInput(input = stepSize, name = "stepSize", valid = c("integer"))
  .validInput(input = excludeChr, name = "excludeChr", valid = c("character", "null"))
  .validInput(input = threads, name = "threads", valid = c("integer"))
  .validInput(input = parallelParam, name = "parallelParam", valid = c("parallelparam","null"))
  .validInput(input = force, name = "force", valid = c("boolean"))

  if(inherits(input, "ArchRProject")){
    ArrowFiles <- getArrowFiles(input)
    allCells <- rownames(getCellColData(input))
    outDir <- getOutputDirectory(input)
  }else if(inherits(input, "character")){
    outDir <- ""
    ArrowFiles <- input
    allCells <- NULL
  }else{
    stop("Error Unrecognized Input!")
  }
  if(!all(file.exists(ArrowFiles))){
    stop("Error Input Arrow Files do not all exist!")
  }

  #First we need to make windows for CNV
  windows <- .makeWindows(
    genome = genome,
    chromSizes = chromSizes, 
    blacklist = blacklist, 
    windowSize = windowSize, 
    stepSize = stepSize,
    threads = threads
  )
  windows <- windows[BiocGenerics::which(seqnames(windows) %bcni% excludeChr)]

  #Add args to list
  args <- list() #mget(names(formals()),sys.frame(sys.nframe()))#as.list(match.call())
  args$FUN <- .addCNVMatrix
  args$X <- seq_along(ArrowFiles)
  args$ArrowFiles <- ArrowFiles
  args$allCells <- allCells
  args$windows <- windows
  args$registryDir <- file.path(outDir, "CNVRegistry")
  args$force <- force
  args$threads <- threads
  args$logFile <- logFile
  args$normByNeighbors <- normByNeighbors
  args$windowSize <- windowSize
  args$stepSize <- stepSize
  args$excludeChr <- excludeChr

  #Run With Parallel or lapply
  outList <- .batchlapply(args)

  if(inherits(input, "ArchRProject")){
    return(input)
  }else{
    return(unlist(outList))
  }

}

.addCNVMatrix <- function(
  i = NULL,
  ArrowFiles = NULL, 
  normByNeighbors = FALSE,
  windowSize  = NULL,
  excludeChr = "",
  stepSize = NULL,
  cellNames = NULL, 
  allCells = NULL,
  windows = NULL,
  force = FALSE,
  tstart = NULL,
  subThreads = NULL,
  logFile = NULL
  ){

  ArrowFile <- ArrowFiles[i]
  o <- h5closeAll()

  #Check
  o <- h5closeAll()
  o <- .createArrowGroup(ArrowFile = ArrowFile, group = "CNVMatrix", force = force, logFile = logFile)

  # if(!suppressMessages(h5createGroup(file = ArrowFile, "CNVMatrix"))){
  #   if(force){
  #     o <- h5delete(file = ArrowFile, name = "CNVMatrix")
  #     o <- h5createGroup(ArrowFile, "CNVMatrix")
  #   }else{
  #     stop(sprintf("%s Already Exists!, set force = TRUE to override!", "CNVMatrix"))
  #   }
  # }

  tstart <- Sys.time()

  #Get all cell ids before constructing matrix
  if(is.null(cellNames)){
    cellNames <- .availableCells(ArrowFile)
  }

  if(!is.null(allCells)){
    cellNames <- cellNames[cellNames %in% allCells]
  }

  ######################################
  # Create Window Matrices Then Summarize to CNV Estimates
  ######################################
  uniqueChr <- as.character(unique(seqnames(windows)@values))

  seWindows <- .safelapply(seq_along(uniqueChr), function(x){

    o <- h5closeAll()
    chr <- uniqueChr[x]
    windowsx <- windows[BiocGenerics::which(seqnames(windows)==chr)]
    rangesx <- ranges(windowsx)
    .messageDiffTime(sprintf("Counting Windows for Chromosome %s of %s!", x, length(uniqueChr)), tstart)    

    #Read in Fragments
    fragments <- .getFragsFromArrow(ArrowFile, chr = chr, out = "IRanges", cellNames = cellNames)

    #Count Left Insertion
    temp <- IRanges(start = start(fragments), width = 1)
    stopifnot(length(temp) == length(fragments))
    oleft <- findOverlaps(ranges(rangesx), temp)
    oleft <- DataFrame(queryHits=Rle(queryHits(oleft)), subjectHits = subjectHits(oleft))

    #Count Right Insertion
    temp <- IRanges(start = end(fragments), width = 1)
    stopifnot(length(temp) == length(fragments))
    oright <- findOverlaps(ranges(rangesx), temp)
    oright <- DataFrame(queryHits=Rle(queryHits(oright)), subjectHits = subjectHits(oright))
    remove(temp)

    #Correct to RG ID
    oleft$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oleft$subjectHits], cellNames))
    oright$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oright$subjectHits], cellNames))
    remove(fragments)

    #Create Sparse Matrix
    mat <- Matrix::sparseMatrix(
      i = c( oleft$queryHits, oright$queryHits ),
      j = c( oleft$subjectHits, oright$subjectHits ),
      x = rep(1, nrow(oleft) + nrow(oright)),
      dims = c(length(rangesx), length(cellNames))
      )
    colnames(mat) <- cellNames

    #Summarize Split Windows
    windowSummary <- GRanges()
    countSummary <- matrix(nrow=length(unique(mcols(windowsx)$name)), ncol = ncol(mat))
    rownames(countSummary) <- unique(mcols(windowsx)$name)
    colnames(countSummary) <- cellNames

    for(y in seq_len(nrow(countSummary))){
      idx <- which(mcols(windowsx)$name == rownames(countSummary)[y])
      wx <- windowsx[idx]
      wo <- GRanges(mcols(wx)$wSeq , ranges = IRanges(mcols(wx)$wStart, mcols(wx)$wEnd))[1,]
      mcols(wo)$name <- mcols(wx)$name[1]
      mcols(wo)$effectiveLength <- sum(width(wx))
      mcols(wo)$percentEffectiveLength <- 100*sum(width(wx))/width(wo)
      mcols(wo)$GC <- sum(mcols(wx)$GC * width(wx))/width(wo)
      mcols(wo)$AT <- sum(mcols(wx)$AT * width(wx))/width(wo)
      mcols(wo)$N <- sum(mcols(wx)$N * width(wx))/width(wo)
      countSummary[y,] <- Matrix::colSums(mat[idx, ,drop=FALSE]) / sum(width(wx))
      windowSummary <- c(windowSummary, wo)
    }
    seqlevels(windowSummary) <- uniqueChr

    SummarizedExperiment::SummarizedExperiment(assays = SimpleList(counts = countSummary), rowRanges = windowSummary)

  }, threads = 1) %>% Reduce("rbind", .)

  .messageDiffTime("Filtering Low Quality Windows", tstart)

  #Keep only regions with less than 0.1% N
  seWindows <- seWindows[which(mcols(seWindows)$N < 0.001) ]

  #Keep only regions with effective size greater than 90%
  seWindows <- seWindows[rowData(seWindows)$percentEffectiveLength >= 90]

  #Normalize By Nucleotide Content KNN
  .messageDiffTime("Normalizing GC-Bias", tstart)
  k <- 25
  seWindows <- seWindows[order(mcols(seWindows)$GC)]
  assays(seWindows)$log2GCNorm <- log2( (assays(seWindows)$counts + 1e-5) / (apply(assays(seWindows)$counts, 2, function(x) .centerRollMean(x, k)) + 1e-5))

  #Normalize By Cells With Similar Total Counts Genome Wide
  if(normByNeighbors){
    .messageDiffTime("Normalizing Coverage-Bias using Neighbors", tstart)
    totalCounts <- colSums(assays(seWindows)$counts * rowData(seWindows)$effectiveLength)
    seWindows <- seWindows[, order(totalCounts)]
    assays(seWindows)$log2GCNorm <- assays(seWindows)$log2GCNorm - t(apply(assays(seWindows)$log2GCNorm, 1, function(y) .centerRollMean(y, floor(0.025 * ncol(seWindows)))))
    seWindows <- seWindows[, cellNames]
  }

  #Re-Order Ranges and Smooth Across Individual Chromosomes
  .messageDiffTime("Smoothing CNV Scores", tstart)
  seWindows <- sort(sortSeqlevels(seWindows))
  seWindows <- lapply(uniqueChr, function(x){
    seWindowsx <- seWindows[BiocGenerics::which(seqnames(seWindows) %bcin% x)]
    assays(seWindowsx)$log2GCSmooth <- apply(assays(seWindowsx)$log2GCNorm, 2, function(y) .centerRollMean(y, ceiling( (windowSize / stepSize) / 2)))
    seWindowsx
  }) %>% Reduce("rbind", .)

  #Index Windows
  windows <- lapply(split(rowRanges(seWindows), seqnames(seWindows)), function(x) {
    mcols(x)$idx <- seq_along(x)
    x
  }) %>% Reduce("c", .) %>% sortSeqlevels %>% sort
  rowRanges(seWindows) <- windows[rownames(seWindows)]

  #Write To Arrow Files
  .messageDiffTime("Adding to Arrow Files", tstart)
  dfParams <- data.frame(
    windowSize = windowSize, 
    stepSize = stepSize,
    excludeChr = excludeChr,
    stringsAsFactors = FALSE)

  featureDF <- data.frame(
    seqnames = paste0(seqnames(seWindows)), 
    idx = mcols(seWindows)$idx, 
    start = start(seWindows), 
    end = end(seWindows), 
    name = mcols(seWindows)$name,
    GC = mcols(seWindows)$GC,
    effectiveLength = mcols(seWindows)$effectiveLength,
    stringsAsFactors = FALSE)

  ######################################
  # Initialize Mat Group
  ######################################
  o <- .initializeMat(
    ArrowFile = ArrowFile,
    Group = "CNVMatrix",
    Class = "Double",
    cellNames = cellNames,
    params = dfParams,
    featureDF = featureDF,
    force = force
  )

  ######################################
  # Write Mat Group
  ######################################
  uniqueChr <- unique(paste0(seqnames(seWindows)))
  seWindows <- seWindows[, cellNames]

  for(x in seq_along(uniqueChr)){

    #Write Matrix to Arrow File!
    o <- .addMatToArrow(
      mat = Matrix::Matrix(assays(seWindows[BiocGenerics::which(seqnames(seWindows) == uniqueChr[x])])$log2GCSmooth, sparse = TRUE), 
      ArrowFile = ArrowFile, 
      Group = paste0("CNVMatrix/", uniqueChr[x]), 
      binarize = FALSE
    ) 

    gc()

  }

  return(ArrowFile)

}

.makeWindows <- function(
  genome = NULL,
  chromSizes = NULL,
  blacklist = NULL,
  windowSize = 10e6,
  stepSize = 2e6,
  threads = 1
  ){

  genome <- validBSgenome(genome)

  #Sliding Windows
  windows <- slidingWindows(x = chromSizes, width = windowSize, step = stepSize) %>% 
    .safeUnlist %>% 
      .[which(width(.)==windowSize),]

  #Add OG Info Prior to Breaking Windows Up Etc
  mcols(windows)$wSeq <- as.character(seqnames(windows))
  mcols(windows)$wStart <- start(windows)
  mcols(windows)$wEnd <- end(windows)

  message("Subtracting Blacklist from Windows...")
  windowsBL <- .safelapply(seq_along(windows), function(x){
      if(x %% 100 == 0){
        message(sprintf("%s of %s", x, length(windows)))
      }
      gr <- GenomicRanges::setdiff(windows[x,], blacklist)
      mcols(gr) <- mcols(windows[x,])
      return(gr)
    }, threads = threads)
  names(windowsBL) <- paste0("w",seq_along(windowsBL))
  windowsBL <- .safeUnlist(windowsBL, use.names = TRUE)
  mcols(windowsBL)$name <- names(windowsBL)
  names(windowsBL) <- paste0("s",seq_along(windowsBL))

  message("Adding Nucleotide Information...")
  windowSplit <- split(windowsBL, as.character(seqnames(windowsBL)))
  windowNuc <- .safelapply(seq_along(windowSplit), function(x){
      message(".", appendLF = FALSE)
      chrSeq <- Biostrings::getSeq(genome, chromSizes[BiocGenerics::which(seqnames(chromSizes) == names(windowSplit)[x])])
      grx <- windowSplit[[x]]
      aFreq <- alphabetFrequency(Biostrings::Views(chrSeq[[1]], ranges(grx)))
      mcols(grx)$GC <- rowSums(aFreq[, c("G","C")]) / rowSums(aFreq)
      mcols(grx)$AT <- rowSums(aFreq[, c("A","T")]) / rowSums(aFreq)
      return(grx)
    }, threads = threads) %>% .safeUnlist %>% sortSeqlevels %>% sort
  message("\n")
  windowNuc$N <- 1 - (windowNuc$GC + windowNuc$AT)
  windowNuc
}

.safeUnlist <- function(x, use.names = TRUE){
  y <- unlist(x, use.names = use.names)
  if(identical(y, x)){
    isGRL <- ArchR:::.isGRList(x)
    if(isGRL){
      for(i in seq_along(x)){
        if(i == 1){
          y <- x[[i]]
          if(length(y) != 0) if(use.names) names(y) <- rep(names(x)[i], length(y))
        }else{
          yi <-  x[[i]]
          if(length(yi) != 0) if(use.names) names(yi) <- rep(names(x)[i], length(yi))
          y <- c(y, yi)
        }
      }
    }else{
      stop("Unlist does not collapse/change input and is not GRangesList!")
    }
  }
  y
}

To use -

proj <- getTestProject()
proj <- addCNVMatrix(proj)
CNVMat <- getMatrixFromProject(proj, "CNVMatrix")
jgranja24 commented 3 years ago

Feel free to try it out. I am not going to be maintaining this too much more at this time, but you can feel free to comment further. I am closing this issue but you can still reply to the issue and I will look when I have time.

ccruizm commented 3 years ago

Thanks for sharing the code @jgranja24! I have given it a try but got an error:

Subtracting Blacklist from Windows...

100 of 1438

200 of 1438

300 of 1438

400 of 1438

500 of 1438

600 of 1438

700 of 1438

800 of 1438

900 of 1438

1000 of 1438

1100 of 1438

1200 of 1438

1300 of 1438

1400 of 1438

Adding Nucleotide Information...

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2020-12-29 16:15:44 : Batch Execution w/ safelapply!, 0 mins elapsed.

Can not create group. Object with name 'CNVMatrix' already exists.

2020-12-29 16:15:45 : Counting Windows for Chromosome 1 of 23!, 0.002 mins elapsed.

2020-12-29 16:16:49 : Counting Windows for Chromosome 2 of 23!, 1.067 mins elapsed.

2020-12-29 16:17:59 : Counting Windows for Chromosome 3 of 23!, 2.226 mins elapsed.

2020-12-29 16:18:46 : Counting Windows for Chromosome 4 of 23!, 3.017 mins elapsed.

2020-12-29 16:19:30 : Counting Windows for Chromosome 5 of 23!, 3.753 mins elapsed.

2020-12-29 16:20:13 : Counting Windows for Chromosome 6 of 23!, 4.466 mins elapsed.

2020-12-29 16:20:53 : Counting Windows for Chromosome 7 of 23!, 5.126 mins elapsed.

2020-12-29 16:21:33 : Counting Windows for Chromosome 8 of 23!, 5.8 mins elapsed.

2020-12-29 16:22:02 : Counting Windows for Chromosome 9 of 23!, 6.287 mins elapsed.

2020-12-29 16:22:35 : Counting Windows for Chromosome 10 of 23!, 6.826 mins elapsed.

2020-12-29 16:23:09 : Counting Windows for Chromosome 11 of 23!, 7.402 mins elapsed.

2020-12-29 16:23:43 : Counting Windows for Chromosome 12 of 23!, 7.969 mins elapsed.

2020-12-29 16:24:23 : Counting Windows for Chromosome 13 of 23!, 8.627 mins elapsed.

2020-12-29 16:24:43 : Counting Windows for Chromosome 14 of 23!, 8.973 mins elapsed.

2020-12-29 16:25:03 : Counting Windows for Chromosome 15 of 23!, 9.295 mins elapsed.

2020-12-29 16:25:25 : Counting Windows for Chromosome 16 of 23!, 9.658 mins elapsed.

2020-12-29 16:25:50 : Counting Windows for Chromosome 17 of 23!, 10.091 mins elapsed.

2020-12-29 16:26:16 : Counting Windows for Chromosome 18 of 23!, 10.515 mins elapsed.

2020-12-29 16:26:32 : Counting Windows for Chromosome 19 of 23!, 10.78 mins elapsed.

2020-12-29 16:26:50 : Counting Windows for Chromosome 20 of 23!, 11.086 mins elapsed.

2020-12-29 16:27:06 : Counting Windows for Chromosome 21 of 23!, 11.347 mins elapsed.

2020-12-29 16:27:14 : Counting Windows for Chromosome 22 of 23!, 11.483 mins elapsed.

2020-12-29 16:27:25 : Counting Windows for Chromosome 23 of 23!, 11.673 mins elapsed.

2020-12-29 16:27:59 : Filtering Low Quality Windows, 12.225 mins elapsed.

2020-12-29 16:27:59 : Normalizing GC-Bias, 12.231 mins elapsed.

2020-12-29 16:28:02 : Smoothing CNV Scores, 12.284 mins elapsed.

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'Reduce': please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x,
  withDimnames=FALSE)) <- value' when the dimnames on the supplied
  assay(s) are not identical to the dimnames on
  RangedSummarizedExperiment object 'x'
Traceback:

1. addCNVMatrix(projdipg)
2. .batchlapply(args)   # at line 97 of file <text>
3. do.call(.safelapply, args)
4. (function (..., threads = 1, preschedule = FALSE) 
 . {
 .     if (tolower(.Platform$OS.type) == "windows") {
 .         threads <- 1
 .     }
 .     if (threads > 1) {
 .         o <- mclapply(..., mc.cores = threads, mc.preschedule = preschedule)
 .         errorMsg <- list()
 .         for (i in seq_along(o)) {
 .             if (inherits(o[[i]], "try-error")) {
 .                 capOut <- utils::capture.output(o[[i]])
 .                 capOut <- capOut[!grepl("attr\\(\\,|try-error", 
 .                   capOut)]
 .                 capOut <- head(capOut, 10)
 .                 capOut <- unlist(lapply(capOut, function(x) substr(x, 
 .                   1, 250)))
 .                 capOut <- paste0("\t", capOut)
 .                 errorMsg[[length(errorMsg) + 1]] <- paste0(c(paste0("Error Found Iteration ", 
 .                   i, " : "), capOut), "\n")
 .             }
 .         }
 .         if (length(errorMsg) != 0) {
 .             errorMsg <- unlist(errorMsg)
 .             errorMsg <- head(errorMsg, 50)
 .             errorMsg[1] <- paste0("\n", errorMsg[1])
 .             stop(errorMsg)
 .         }
 .     }
 .     else {
 .         o <- lapply(...)
 .     }
 .     o
 . })(FUN = function (i = NULL, ArrowFiles = NULL, normByNeighbors = FALSE, 
 .     windowSize = NULL, excludeChr = "", stepSize = NULL, cellNames = NULL, 
 .     allCells = NULL, windows = NULL, force = FALSE, tstart = NULL, 
 .     subThreads = NULL, logFile = NULL) 
 . {
 .     ArrowFile <- ArrowFiles[i]
 .     o <- h5closeAll()
 .     o <- h5closeAll()
 .     o <- .createArrowGroup(ArrowFile = ArrowFile, group = "CNVMatrix", 
 .         force = force, logFile = logFile)
 .     tstart <- Sys.time()
 .     if (is.null(cellNames)) {
 .         cellNames <- .availableCells(ArrowFile)
 .     }
 .     if (!is.null(allCells)) {
 .         cellNames <- cellNames[cellNames %in% allCells]
 .     }
 .     uniqueChr <- as.character(unique(seqnames(windows)@values))
 .     seWindows <- .safelapply(seq_along(uniqueChr), function(x) {
 .         o <- h5closeAll()
 .         chr <- uniqueChr[x]
 .         windowsx <- windows[BiocGenerics::which(seqnames(windows) == 
 .             chr)]
 .         rangesx <- ranges(windowsx)
 .         .messageDiffTime(sprintf("Counting Windows for Chromosome %s of %s!", 
 .             x, length(uniqueChr)), tstart)
 .         fragments <- .getFragsFromArrow(ArrowFile, chr = chr, 
 .             out = "IRanges", cellNames = cellNames)
 .         temp <- IRanges(start = start(fragments), width = 1)
 .         stopifnot(length(temp) == length(fragments))
 .         oleft <- findOverlaps(ranges(rangesx), temp)
 .         oleft <- DataFrame(queryHits = Rle(queryHits(oleft)), 
 .             subjectHits = subjectHits(oleft))
 .         temp <- IRanges(start = end(fragments), width = 1)
 .         stopifnot(length(temp) == length(fragments))
 .         oright <- findOverlaps(ranges(rangesx), temp)
 .         oright <- DataFrame(queryHits = Rle(queryHits(oright)), 
 .             subjectHits = subjectHits(oright))
 .         remove(temp)
 .         oleft$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oleft$subjectHits], 
 .             cellNames))
 .         oright$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oright$subjectHits], 
 .             cellNames))
 .         remove(fragments)
 .         mat <- Matrix::sparseMatrix(i = c(oleft$queryHits, oright$queryHits), 
 .             j = c(oleft$subjectHits, oright$subjectHits), x = rep(1, 
 .                 nrow(oleft) + nrow(oright)), dims = c(length(rangesx), 
 .                 length(cellNames)))
 .         colnames(mat) <- cellNames
 .         windowSummary <- GRanges()
 .         countSummary <- matrix(nrow = length(unique(mcols(windowsx)$name)), 
 .             ncol = ncol(mat))
 .         rownames(countSummary) <- unique(mcols(windowsx)$name)
 .         colnames(countSummary) <- cellNames
 .         for (y in seq_len(nrow(countSummary))) {
 .             idx <- which(mcols(windowsx)$name == rownames(countSummary)[y])
 .             wx <- windowsx[idx]
 .             wo <- GRanges(mcols(wx)$wSeq, ranges = IRanges(mcols(wx)$wStart, 
 .                 mcols(wx)$wEnd))[1, ]
 .             mcols(wo)$name <- mcols(wx)$name[1]
 .             mcols(wo)$effectiveLength <- sum(width(wx))
 .             mcols(wo)$percentEffectiveLength <- 100 * sum(width(wx))/width(wo)
 .             mcols(wo)$GC <- sum(mcols(wx)$GC * width(wx))/width(wo)
 .             mcols(wo)$AT <- sum(mcols(wx)$AT * width(wx))/width(wo)
 .             mcols(wo)$N <- sum(mcols(wx)$N * width(wx))/width(wo)
 .             countSummary[y, ] <- Matrix::colSums(mat[idx, , drop = FALSE])/sum(width(wx))
 .             windowSummary <- c(windowSummary, wo)
 .         }
 .         seqlevels(windowSummary) <- uniqueChr
 .         SummarizedExperiment::SummarizedExperiment(assays = SimpleList(counts = countSummary), 
 .             rowRanges = windowSummary)
 .     }, threads = 1) %>% Reduce("rbind", .)
 .     .messageDiffTime("Filtering Low Quality Windows", tstart)
 .     seWindows <- seWindows[which(mcols(seWindows)$N < 0.001)]
 .     seWindows <- seWindows[rowData(seWindows)$percentEffectiveLength >= 
 .         90]
 .     .messageDiffTime("Normalizing GC-Bias", tstart)
 .     k <- 25
 .     seWindows <- seWindows[order(mcols(seWindows)$GC)]
 .     assays(seWindows)$log2GCNorm <- log2((assays(seWindows)$counts + 
 .         1e-05)/(apply(assays(seWindows)$counts, 2, function(x) .centerRollMean(x, 
 .         k)) + 1e-05))
 .     if (normByNeighbors) {
 .         .messageDiffTime("Normalizing Coverage-Bias using Neighbors", 
 .             tstart)
 .         totalCounts <- colSums(assays(seWindows)$counts * rowData(seWindows)$effectiveLength)
 .         seWindows <- seWindows[, order(totalCounts)]
 .         assays(seWindows)$log2GCNorm <- assays(seWindows)$log2GCNorm - 
 .             t(apply(assays(seWindows)$log2GCNorm, 1, function(y) .centerRollMean(y, 
 .                 floor(0.025 * ncol(seWindows)))))
 .         seWindows <- seWindows[, cellNames]
 .     }
 .     .messageDiffTime("Smoothing CNV Scores", tstart)
 .     seWindows <- sort(sortSeqlevels(seWindows))
 .     seWindows <- lapply(uniqueChr, function(x) {
 .         seWindowsx <- seWindows[BiocGenerics::which(seqnames(seWindows) %bcin% 
 .             x)]
 .         assays(seWindowsx)$log2GCSmooth <- apply(assays(seWindowsx)$log2GCNorm, 
 .             2, function(y) .centerRollMean(y, ceiling((windowSize/stepSize)/2)))
 .         seWindowsx
 .     }) %>% Reduce("rbind", .)
 .     windows <- lapply(split(rowRanges(seWindows), seqnames(seWindows)), 
 .         function(x) {
 .             mcols(x)$idx <- seq_along(x)
 .             x
 .         }) %>% Reduce("c", .) %>% sortSeqlevels %>% sort
 .     rowRanges(seWindows) <- windows[rownames(seWindows)]
 .     .messageDiffTime("Adding to Arrow Files", tstart)
 .     dfParams <- data.frame(windowSize = windowSize, stepSize = stepSize, 
 .         excludeChr = excludeChr, stringsAsFactors = FALSE)
 .     featureDF <- data.frame(seqnames = paste0(seqnames(seWindows)), 
 .         idx = mcols(seWindows)$idx, start = start(seWindows), 
 .         end = end(seWindows), name = mcols(seWindows)$name, GC = mcols(seWindows)$GC, 
 .         effectiveLength = mcols(seWindows)$effectiveLength, stringsAsFactors = FALSE)
 .     o <- .initializeMat(ArrowFile = ArrowFile, Group = "CNVMatrix", 
 .         Class = "Double", cellNames = cellNames, params = dfParams, 
 .         featureDF = featureDF, force = force)
 .     uniqueChr <- unique(paste0(seqnames(seWindows)))
 .     seWindows <- seWindows[, cellNames]
 .     for (x in seq_along(uniqueChr)) {
 .         o <- .addMatToArrow(mat = Matrix::Matrix(assays(seWindows[BiocGenerics::which(seqnames(seWindows) == 
 .             uniqueChr[x])])$log2GCSmooth, sparse = TRUE), ArrowFile = ArrowFile, 
 .             Group = paste0("CNVMatrix/", uniqueChr[x]), binarize = FALSE)
 .         gc()
 .     }
 .     return(ArrowFile)
 . }, X = 1L, ArrowFiles = c(core = "/hpc/pmc_stunnenberg/cruiz/scATAC/analysis/dipg_ascites/notebooks/primary_tumor/archr/Save-projdipg/ArrowFiles/core.arrow"), 
 .     allCells = c("core#CACCTGTTCTATACCT-1", "core#TGCATGACAACGGACA-1", 
 .     "core#GTCACGGCAGAATGCG-1", "core#AGGCCTGTCGCGATGC-1", "core#CTTCTAATCGTGTGCG-1", 
 .  .......
 "core#GTGCCAGGTCTCTAAG-1")))), elementType = "ANY", 
  .         elementMetadata = NULL, metadata = list()))))
16. h(simpleError(msg, call))

My session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /hpc/pmc_stunnenberg/cruiz/miniconda3/envs/r_env_4.0.3/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0                  
 [3] rtracklayer_1.50.0                Biostrings_2.58.0                
 [5] XVector_0.30.0                    Seurat_3.9.9.9023                
 [7] ArchR_1.0.1                       magrittr_2.0.1                   
 [9] rhdf5_2.34.0                      Matrix_1.2-18                    
[11] data.table_1.13.4                 SummarizedExperiment_1.20.0      
[13] Biobase_2.50.0                    GenomicRanges_1.42.0             
[15] GenomeInfoDb_1.26.2               IRanges_2.24.1                   
[17] S4Vectors_0.28.1                  BiocGenerics_0.36.0              
[19] MatrixGenerics_1.2.0              matrixStats_0.57.0               
[21] ggplot2_3.3.2.9000               

loaded via a namespace (and not attached):
  [1] Rtsne_0.15               colorspace_2.0-0         deldir_0.2-3            
  [4] ellipsis_0.3.1           ggridges_0.5.2           IRdisplay_0.7.0         
  [7] base64enc_0.1-3          spatstat.data_1.7-0      leiden_0.3.6            
 [10] listenv_0.8.0            ggrepel_0.9.0            codetools_0.2-18        
 [13] splines_4.0.3            polyclip_1.10-0          IRkernel_1.1.1          
 [16] jsonlite_1.7.2           Rsamtools_2.6.0          ica_1.0-2               
 [19] cluster_2.1.0            png_0.1-7                uwot_0.1.10             
 [22] shiny_1.5.0              sctransform_0.3.2.9000   compiler_4.0.3          
 [25] httr_1.4.2               fastmap_1.0.1            lazyeval_0.2.2          
 [28] later_1.1.0.1            htmltools_0.5.0          tools_4.0.3             
 [31] rsvd_1.0.3               igraph_1.2.6             gtable_0.3.0            
 [34] glue_1.4.2               GenomeInfoDbData_1.2.4   RANN_2.6.1              
 [37] reshape2_1.4.4           dplyr_1.0.2              Rcpp_1.0.5              
 [40] spatstat_1.64-1          scattermore_0.7          vctrs_0.3.6             
 [43] rhdf5filters_1.2.0       nlme_3.1-150             lmtest_0.9-38           
 [46] stringr_1.4.0            globals_0.14.0           mime_0.9                
 [49] miniUI_0.1.1.1           lifecycle_0.2.0          irlba_2.3.3             
 [52] XML_3.99-0.5             goftest_1.2-2            future_1.21.0           
 [55] zlibbioc_1.36.0          MASS_7.3-53              zoo_1.8-8               
 [58] scales_1.1.1             promises_1.1.1           spatstat.utils_1.17-0   
 [61] RColorBrewer_1.1-2       reticulate_1.18          pbapply_1.4-3           
 [64] gridExtra_2.3            rpart_4.1-15             stringi_1.5.3           
 [67] BiocParallel_1.24.1      repr_1.1.0               rlang_0.4.9             
 [70] pkgconfig_2.0.3          bitops_1.0-6             evaluate_0.14           
 [73] lattice_0.20-41          tensor_1.5               ROCR_1.0-11             
 [76] purrr_0.3.4              Rhdf5lib_1.12.0          GenomicAlignments_1.26.0
 [79] patchwork_1.1.1          htmlwidgets_1.5.3        cowplot_1.1.0           
 [82] tidyselect_1.1.0         parallelly_1.22.0        RcppAnnoy_0.0.18        
 [85] plyr_1.8.6               R6_2.5.0                 generics_0.1.0          
 [88] pbdZMQ_0.3-3.1           DelayedArray_0.16.0      mgcv_1.8-33             
 [91] pillar_1.4.7             withr_2.3.0              fitdistrplus_1.1-3      
 [94] abind_1.4-5              survival_3.2-7           RCurl_1.98-1.2          
 [97] tibble_3.0.4             future.apply_1.6.0       crayon_1.3.4            
[100] uuid_0.1-4               KernSmooth_2.23-18       plotly_4.9.2.2          
[103] grid_4.0.3               digest_0.6.27            xtable_1.8-4            
[106] tidyr_1.1.2              httpuv_1.5.4             munsell_0.5.0           
[109] viridisLite_0.3.0       

What do you think the problem might be? I apologize in advance to bother you with it since this isn't a function inside ArchR

Thanks in advance for your help!

ttchen0714 commented 1 year ago

Hi,

I'm a new ArchR user. I have the same problem, have you solved it? I would like to ask if you have any suggestions.

Thanks! Tina