Closed hugokitano closed 3 years ago
CNV analysis is not currently available in ArchR. You may get a reply here about your other questions but no guarantees. I'll leave this open for now.
Hi @hugokitano,
Unfortunately I do not think this will be a feature anytime soon. I have code that will work similar to that paper. However, we have not extensively validated this approach with real test samples. Thus, I am hesitant to add this to ArchR at this time.
Function(s) to run ---
#Run This to accesss hidden functions
fn <- unclass(lsf.str(envir = asNamespace("ArchR"), all = TRUE))
for (i in seq_along(fn)) {
tryCatch({
eval(parse(text = paste0(fn[i], "<-ArchR:::", fn[i])))
}, error = function(x) {
})
}
####################################################################
# Copy Number Variation Methods
####################################################################
#' Add a CNV matrix to ArrowFiles or an ArchRProject
#'
#' This function for each sample will predict copy number variation from accessibility
#'
#' @param input An `ArchRProject` object or character vector of ArrowFiles.
#' @param chromSizes A named numeric vector containing the chromsome names and lengths. The default behavior is to retrieve this from the `ArchRProject` using `getChromSizes()`.
#' @param blacklist A `GRanges` object containing genomic regions to blacklist from calling CNVs. The default behavior is to retrieve this from the `ArchRProject` using `getBlacklist()`.
#' @param genome The name of the genome used by the `input`. The default behavior is to retrieve this from the `ArchRProject` using `getGenome()`.
#' @param windowSize The size in basepairs for the sliding window used to break up each chromosome to look for CNVs.
#' @param stepSize The size in basepairs for the step used to create sliding window bins across each chromosome.
#' @param excludeChr A character vector containing the `seqnames` of the chromosomes that should be excluded from CNV analysis.
#' @param threads The number of threads to be used for parallel computing.
#' @param parallelParam A list of parameters to be passed for biocparallel/batchtools parallel computing.
#' @param force A boolean value indicating whether to force the CNV matrix to be overwritten if it already exists for `input`.
addCNVMatrix <- function(
input = NULL,
chromSizes = getChromSizes(input),
blacklist = getBlacklist(input),
genome = getGenome(input),
windowSize = 10e6,
stepSize = 2e6,
normByNeighbors = FALSE,
excludeChr = c("chrM","chrY"),
threads = getArchRThreads(),
parallelParam = NULL,
force = FALSE,
logFile = createLogFile("addCNVMatrix")
){
.validInput(input = input, name = "input", valid = c("character", "ArchRProj"))
.validInput(input = chromSizes, name = "chromSizes", valid = c("GRanges"))
.validInput(input = blacklist, name = "blacklist", valid = c("GRanges", "null"))
.validInput(input = genome, name = "genome", valid = c("character"))
.validInput(input = windowSize, name = "windowSize", valid = c("integer"))
.validInput(input = stepSize, name = "stepSize", valid = c("integer"))
.validInput(input = excludeChr, name = "excludeChr", valid = c("character", "null"))
.validInput(input = threads, name = "threads", valid = c("integer"))
.validInput(input = parallelParam, name = "parallelParam", valid = c("parallelparam","null"))
.validInput(input = force, name = "force", valid = c("boolean"))
if(inherits(input, "ArchRProject")){
ArrowFiles <- getArrowFiles(input)
allCells <- rownames(getCellColData(input))
outDir <- getOutputDirectory(input)
}else if(inherits(input, "character")){
outDir <- ""
ArrowFiles <- input
allCells <- NULL
}else{
stop("Error Unrecognized Input!")
}
if(!all(file.exists(ArrowFiles))){
stop("Error Input Arrow Files do not all exist!")
}
#First we need to make windows for CNV
windows <- .makeWindows(
genome = genome,
chromSizes = chromSizes,
blacklist = blacklist,
windowSize = windowSize,
stepSize = stepSize,
threads = threads
)
windows <- windows[BiocGenerics::which(seqnames(windows) %bcni% excludeChr)]
#Add args to list
args <- list() #mget(names(formals()),sys.frame(sys.nframe()))#as.list(match.call())
args$FUN <- .addCNVMatrix
args$X <- seq_along(ArrowFiles)
args$ArrowFiles <- ArrowFiles
args$allCells <- allCells
args$windows <- windows
args$registryDir <- file.path(outDir, "CNVRegistry")
args$force <- force
args$threads <- threads
args$logFile <- logFile
args$normByNeighbors <- normByNeighbors
args$windowSize <- windowSize
args$stepSize <- stepSize
args$excludeChr <- excludeChr
#Run With Parallel or lapply
outList <- .batchlapply(args)
if(inherits(input, "ArchRProject")){
return(input)
}else{
return(unlist(outList))
}
}
.addCNVMatrix <- function(
i = NULL,
ArrowFiles = NULL,
normByNeighbors = FALSE,
windowSize = NULL,
excludeChr = "",
stepSize = NULL,
cellNames = NULL,
allCells = NULL,
windows = NULL,
force = FALSE,
tstart = NULL,
subThreads = NULL,
logFile = NULL
){
ArrowFile <- ArrowFiles[i]
o <- h5closeAll()
#Check
o <- h5closeAll()
o <- .createArrowGroup(ArrowFile = ArrowFile, group = "CNVMatrix", force = force, logFile = logFile)
# if(!suppressMessages(h5createGroup(file = ArrowFile, "CNVMatrix"))){
# if(force){
# o <- h5delete(file = ArrowFile, name = "CNVMatrix")
# o <- h5createGroup(ArrowFile, "CNVMatrix")
# }else{
# stop(sprintf("%s Already Exists!, set force = TRUE to override!", "CNVMatrix"))
# }
# }
tstart <- Sys.time()
#Get all cell ids before constructing matrix
if(is.null(cellNames)){
cellNames <- .availableCells(ArrowFile)
}
if(!is.null(allCells)){
cellNames <- cellNames[cellNames %in% allCells]
}
######################################
# Create Window Matrices Then Summarize to CNV Estimates
######################################
uniqueChr <- as.character(unique(seqnames(windows)@values))
seWindows <- .safelapply(seq_along(uniqueChr), function(x){
o <- h5closeAll()
chr <- uniqueChr[x]
windowsx <- windows[BiocGenerics::which(seqnames(windows)==chr)]
rangesx <- ranges(windowsx)
.messageDiffTime(sprintf("Counting Windows for Chromosome %s of %s!", x, length(uniqueChr)), tstart)
#Read in Fragments
fragments <- .getFragsFromArrow(ArrowFile, chr = chr, out = "IRanges", cellNames = cellNames)
#Count Left Insertion
temp <- IRanges(start = start(fragments), width = 1)
stopifnot(length(temp) == length(fragments))
oleft <- findOverlaps(ranges(rangesx), temp)
oleft <- DataFrame(queryHits=Rle(queryHits(oleft)), subjectHits = subjectHits(oleft))
#Count Right Insertion
temp <- IRanges(start = end(fragments), width = 1)
stopifnot(length(temp) == length(fragments))
oright <- findOverlaps(ranges(rangesx), temp)
oright <- DataFrame(queryHits=Rle(queryHits(oright)), subjectHits = subjectHits(oright))
remove(temp)
#Correct to RG ID
oleft$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oleft$subjectHits], cellNames))
oright$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oright$subjectHits], cellNames))
remove(fragments)
#Create Sparse Matrix
mat <- Matrix::sparseMatrix(
i = c( oleft$queryHits, oright$queryHits ),
j = c( oleft$subjectHits, oright$subjectHits ),
x = rep(1, nrow(oleft) + nrow(oright)),
dims = c(length(rangesx), length(cellNames))
)
colnames(mat) <- cellNames
#Summarize Split Windows
windowSummary <- GRanges()
countSummary <- matrix(nrow=length(unique(mcols(windowsx)$name)), ncol = ncol(mat))
rownames(countSummary) <- unique(mcols(windowsx)$name)
colnames(countSummary) <- cellNames
for(y in seq_len(nrow(countSummary))){
idx <- which(mcols(windowsx)$name == rownames(countSummary)[y])
wx <- windowsx[idx]
wo <- GRanges(mcols(wx)$wSeq , ranges = IRanges(mcols(wx)$wStart, mcols(wx)$wEnd))[1,]
mcols(wo)$name <- mcols(wx)$name[1]
mcols(wo)$effectiveLength <- sum(width(wx))
mcols(wo)$percentEffectiveLength <- 100*sum(width(wx))/width(wo)
mcols(wo)$GC <- sum(mcols(wx)$GC * width(wx))/width(wo)
mcols(wo)$AT <- sum(mcols(wx)$AT * width(wx))/width(wo)
mcols(wo)$N <- sum(mcols(wx)$N * width(wx))/width(wo)
countSummary[y,] <- Matrix::colSums(mat[idx, ,drop=FALSE]) / sum(width(wx))
windowSummary <- c(windowSummary, wo)
}
seqlevels(windowSummary) <- uniqueChr
SummarizedExperiment::SummarizedExperiment(assays = SimpleList(counts = countSummary), rowRanges = windowSummary)
}, threads = 1) %>% Reduce("rbind", .)
.messageDiffTime("Filtering Low Quality Windows", tstart)
#Keep only regions with less than 0.1% N
seWindows <- seWindows[which(mcols(seWindows)$N < 0.001) ]
#Keep only regions with effective size greater than 90%
seWindows <- seWindows[rowData(seWindows)$percentEffectiveLength >= 90]
#Normalize By Nucleotide Content KNN
.messageDiffTime("Normalizing GC-Bias", tstart)
k <- 25
seWindows <- seWindows[order(mcols(seWindows)$GC)]
assays(seWindows)$log2GCNorm <- log2( (assays(seWindows)$counts + 1e-5) / (apply(assays(seWindows)$counts, 2, function(x) .centerRollMean(x, k)) + 1e-5))
#Normalize By Cells With Similar Total Counts Genome Wide
if(normByNeighbors){
.messageDiffTime("Normalizing Coverage-Bias using Neighbors", tstart)
totalCounts <- colSums(assays(seWindows)$counts * rowData(seWindows)$effectiveLength)
seWindows <- seWindows[, order(totalCounts)]
assays(seWindows)$log2GCNorm <- assays(seWindows)$log2GCNorm - t(apply(assays(seWindows)$log2GCNorm, 1, function(y) .centerRollMean(y, floor(0.025 * ncol(seWindows)))))
seWindows <- seWindows[, cellNames]
}
#Re-Order Ranges and Smooth Across Individual Chromosomes
.messageDiffTime("Smoothing CNV Scores", tstart)
seWindows <- sort(sortSeqlevels(seWindows))
seWindows <- lapply(uniqueChr, function(x){
seWindowsx <- seWindows[BiocGenerics::which(seqnames(seWindows) %bcin% x)]
assays(seWindowsx)$log2GCSmooth <- apply(assays(seWindowsx)$log2GCNorm, 2, function(y) .centerRollMean(y, ceiling( (windowSize / stepSize) / 2)))
seWindowsx
}) %>% Reduce("rbind", .)
#Index Windows
windows <- lapply(split(rowRanges(seWindows), seqnames(seWindows)), function(x) {
mcols(x)$idx <- seq_along(x)
x
}) %>% Reduce("c", .) %>% sortSeqlevels %>% sort
rowRanges(seWindows) <- windows[rownames(seWindows)]
#Write To Arrow Files
.messageDiffTime("Adding to Arrow Files", tstart)
dfParams <- data.frame(
windowSize = windowSize,
stepSize = stepSize,
excludeChr = excludeChr,
stringsAsFactors = FALSE)
featureDF <- data.frame(
seqnames = paste0(seqnames(seWindows)),
idx = mcols(seWindows)$idx,
start = start(seWindows),
end = end(seWindows),
name = mcols(seWindows)$name,
GC = mcols(seWindows)$GC,
effectiveLength = mcols(seWindows)$effectiveLength,
stringsAsFactors = FALSE)
######################################
# Initialize Mat Group
######################################
o <- .initializeMat(
ArrowFile = ArrowFile,
Group = "CNVMatrix",
Class = "Double",
cellNames = cellNames,
params = dfParams,
featureDF = featureDF,
force = force
)
######################################
# Write Mat Group
######################################
uniqueChr <- unique(paste0(seqnames(seWindows)))
seWindows <- seWindows[, cellNames]
for(x in seq_along(uniqueChr)){
#Write Matrix to Arrow File!
o <- .addMatToArrow(
mat = Matrix::Matrix(assays(seWindows[BiocGenerics::which(seqnames(seWindows) == uniqueChr[x])])$log2GCSmooth, sparse = TRUE),
ArrowFile = ArrowFile,
Group = paste0("CNVMatrix/", uniqueChr[x]),
binarize = FALSE
)
gc()
}
return(ArrowFile)
}
.makeWindows <- function(
genome = NULL,
chromSizes = NULL,
blacklist = NULL,
windowSize = 10e6,
stepSize = 2e6,
threads = 1
){
genome <- validBSgenome(genome)
#Sliding Windows
windows <- slidingWindows(x = chromSizes, width = windowSize, step = stepSize) %>%
.safeUnlist %>%
.[which(width(.)==windowSize),]
#Add OG Info Prior to Breaking Windows Up Etc
mcols(windows)$wSeq <- as.character(seqnames(windows))
mcols(windows)$wStart <- start(windows)
mcols(windows)$wEnd <- end(windows)
message("Subtracting Blacklist from Windows...")
windowsBL <- .safelapply(seq_along(windows), function(x){
if(x %% 100 == 0){
message(sprintf("%s of %s", x, length(windows)))
}
gr <- GenomicRanges::setdiff(windows[x,], blacklist)
mcols(gr) <- mcols(windows[x,])
return(gr)
}, threads = threads)
names(windowsBL) <- paste0("w",seq_along(windowsBL))
windowsBL <- .safeUnlist(windowsBL, use.names = TRUE)
mcols(windowsBL)$name <- names(windowsBL)
names(windowsBL) <- paste0("s",seq_along(windowsBL))
message("Adding Nucleotide Information...")
windowSplit <- split(windowsBL, as.character(seqnames(windowsBL)))
windowNuc <- .safelapply(seq_along(windowSplit), function(x){
message(".", appendLF = FALSE)
chrSeq <- Biostrings::getSeq(genome, chromSizes[BiocGenerics::which(seqnames(chromSizes) == names(windowSplit)[x])])
grx <- windowSplit[[x]]
aFreq <- alphabetFrequency(Biostrings::Views(chrSeq[[1]], ranges(grx)))
mcols(grx)$GC <- rowSums(aFreq[, c("G","C")]) / rowSums(aFreq)
mcols(grx)$AT <- rowSums(aFreq[, c("A","T")]) / rowSums(aFreq)
return(grx)
}, threads = threads) %>% .safeUnlist %>% sortSeqlevels %>% sort
message("\n")
windowNuc$N <- 1 - (windowNuc$GC + windowNuc$AT)
windowNuc
}
.safeUnlist <- function(x, use.names = TRUE){
y <- unlist(x, use.names = use.names)
if(identical(y, x)){
isGRL <- ArchR:::.isGRList(x)
if(isGRL){
for(i in seq_along(x)){
if(i == 1){
y <- x[[i]]
if(length(y) != 0) if(use.names) names(y) <- rep(names(x)[i], length(y))
}else{
yi <- x[[i]]
if(length(yi) != 0) if(use.names) names(yi) <- rep(names(x)[i], length(yi))
y <- c(y, yi)
}
}
}else{
stop("Unlist does not collapse/change input and is not GRangesList!")
}
}
y
}
To use -
proj <- getTestProject()
proj <- addCNVMatrix(proj)
CNVMat <- getMatrixFromProject(proj, "CNVMatrix")
Feel free to try it out. I am not going to be maintaining this too much more at this time, but you can feel free to comment further. I am closing this issue but you can still reply to the issue and I will look when I have time.
Thanks for sharing the code @jgranja24! I have given it a try but got an error:
Subtracting Blacklist from Windows...
100 of 1438
200 of 1438
300 of 1438
400 of 1438
500 of 1438
600 of 1438
700 of 1438
800 of 1438
900 of 1438
1000 of 1438
1100 of 1438
1200 of 1438
1300 of 1438
1400 of 1438
Adding Nucleotide Information...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2020-12-29 16:15:44 : Batch Execution w/ safelapply!, 0 mins elapsed.
Can not create group. Object with name 'CNVMatrix' already exists.
2020-12-29 16:15:45 : Counting Windows for Chromosome 1 of 23!, 0.002 mins elapsed.
2020-12-29 16:16:49 : Counting Windows for Chromosome 2 of 23!, 1.067 mins elapsed.
2020-12-29 16:17:59 : Counting Windows for Chromosome 3 of 23!, 2.226 mins elapsed.
2020-12-29 16:18:46 : Counting Windows for Chromosome 4 of 23!, 3.017 mins elapsed.
2020-12-29 16:19:30 : Counting Windows for Chromosome 5 of 23!, 3.753 mins elapsed.
2020-12-29 16:20:13 : Counting Windows for Chromosome 6 of 23!, 4.466 mins elapsed.
2020-12-29 16:20:53 : Counting Windows for Chromosome 7 of 23!, 5.126 mins elapsed.
2020-12-29 16:21:33 : Counting Windows for Chromosome 8 of 23!, 5.8 mins elapsed.
2020-12-29 16:22:02 : Counting Windows for Chromosome 9 of 23!, 6.287 mins elapsed.
2020-12-29 16:22:35 : Counting Windows for Chromosome 10 of 23!, 6.826 mins elapsed.
2020-12-29 16:23:09 : Counting Windows for Chromosome 11 of 23!, 7.402 mins elapsed.
2020-12-29 16:23:43 : Counting Windows for Chromosome 12 of 23!, 7.969 mins elapsed.
2020-12-29 16:24:23 : Counting Windows for Chromosome 13 of 23!, 8.627 mins elapsed.
2020-12-29 16:24:43 : Counting Windows for Chromosome 14 of 23!, 8.973 mins elapsed.
2020-12-29 16:25:03 : Counting Windows for Chromosome 15 of 23!, 9.295 mins elapsed.
2020-12-29 16:25:25 : Counting Windows for Chromosome 16 of 23!, 9.658 mins elapsed.
2020-12-29 16:25:50 : Counting Windows for Chromosome 17 of 23!, 10.091 mins elapsed.
2020-12-29 16:26:16 : Counting Windows for Chromosome 18 of 23!, 10.515 mins elapsed.
2020-12-29 16:26:32 : Counting Windows for Chromosome 19 of 23!, 10.78 mins elapsed.
2020-12-29 16:26:50 : Counting Windows for Chromosome 20 of 23!, 11.086 mins elapsed.
2020-12-29 16:27:06 : Counting Windows for Chromosome 21 of 23!, 11.347 mins elapsed.
2020-12-29 16:27:14 : Counting Windows for Chromosome 22 of 23!, 11.483 mins elapsed.
2020-12-29 16:27:25 : Counting Windows for Chromosome 23 of 23!, 11.673 mins elapsed.
2020-12-29 16:27:59 : Filtering Low Quality Windows, 12.225 mins elapsed.
2020-12-29 16:27:59 : Normalizing GC-Bias, 12.231 mins elapsed.
2020-12-29 16:28:02 : Smoothing CNV Scores, 12.284 mins elapsed.
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'Reduce': please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x,
withDimnames=FALSE)) <- value' when the dimnames on the supplied
assay(s) are not identical to the dimnames on
RangedSummarizedExperiment object 'x'
Traceback:
1. addCNVMatrix(projdipg)
2. .batchlapply(args) # at line 97 of file <text>
3. do.call(.safelapply, args)
4. (function (..., threads = 1, preschedule = FALSE)
. {
. if (tolower(.Platform$OS.type) == "windows") {
. threads <- 1
. }
. if (threads > 1) {
. o <- mclapply(..., mc.cores = threads, mc.preschedule = preschedule)
. errorMsg <- list()
. for (i in seq_along(o)) {
. if (inherits(o[[i]], "try-error")) {
. capOut <- utils::capture.output(o[[i]])
. capOut <- capOut[!grepl("attr\\(\\,|try-error",
. capOut)]
. capOut <- head(capOut, 10)
. capOut <- unlist(lapply(capOut, function(x) substr(x,
. 1, 250)))
. capOut <- paste0("\t", capOut)
. errorMsg[[length(errorMsg) + 1]] <- paste0(c(paste0("Error Found Iteration ",
. i, " : "), capOut), "\n")
. }
. }
. if (length(errorMsg) != 0) {
. errorMsg <- unlist(errorMsg)
. errorMsg <- head(errorMsg, 50)
. errorMsg[1] <- paste0("\n", errorMsg[1])
. stop(errorMsg)
. }
. }
. else {
. o <- lapply(...)
. }
. o
. })(FUN = function (i = NULL, ArrowFiles = NULL, normByNeighbors = FALSE,
. windowSize = NULL, excludeChr = "", stepSize = NULL, cellNames = NULL,
. allCells = NULL, windows = NULL, force = FALSE, tstart = NULL,
. subThreads = NULL, logFile = NULL)
. {
. ArrowFile <- ArrowFiles[i]
. o <- h5closeAll()
. o <- h5closeAll()
. o <- .createArrowGroup(ArrowFile = ArrowFile, group = "CNVMatrix",
. force = force, logFile = logFile)
. tstart <- Sys.time()
. if (is.null(cellNames)) {
. cellNames <- .availableCells(ArrowFile)
. }
. if (!is.null(allCells)) {
. cellNames <- cellNames[cellNames %in% allCells]
. }
. uniqueChr <- as.character(unique(seqnames(windows)@values))
. seWindows <- .safelapply(seq_along(uniqueChr), function(x) {
. o <- h5closeAll()
. chr <- uniqueChr[x]
. windowsx <- windows[BiocGenerics::which(seqnames(windows) ==
. chr)]
. rangesx <- ranges(windowsx)
. .messageDiffTime(sprintf("Counting Windows for Chromosome %s of %s!",
. x, length(uniqueChr)), tstart)
. fragments <- .getFragsFromArrow(ArrowFile, chr = chr,
. out = "IRanges", cellNames = cellNames)
. temp <- IRanges(start = start(fragments), width = 1)
. stopifnot(length(temp) == length(fragments))
. oleft <- findOverlaps(ranges(rangesx), temp)
. oleft <- DataFrame(queryHits = Rle(queryHits(oleft)),
. subjectHits = subjectHits(oleft))
. temp <- IRanges(start = end(fragments), width = 1)
. stopifnot(length(temp) == length(fragments))
. oright <- findOverlaps(ranges(rangesx), temp)
. oright <- DataFrame(queryHits = Rle(queryHits(oright)),
. subjectHits = subjectHits(oright))
. remove(temp)
. oleft$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oleft$subjectHits],
. cellNames))
. oright$subjectHits <- as.integer(BiocGenerics::match(mcols(fragments)$RG[oright$subjectHits],
. cellNames))
. remove(fragments)
. mat <- Matrix::sparseMatrix(i = c(oleft$queryHits, oright$queryHits),
. j = c(oleft$subjectHits, oright$subjectHits), x = rep(1,
. nrow(oleft) + nrow(oright)), dims = c(length(rangesx),
. length(cellNames)))
. colnames(mat) <- cellNames
. windowSummary <- GRanges()
. countSummary <- matrix(nrow = length(unique(mcols(windowsx)$name)),
. ncol = ncol(mat))
. rownames(countSummary) <- unique(mcols(windowsx)$name)
. colnames(countSummary) <- cellNames
. for (y in seq_len(nrow(countSummary))) {
. idx <- which(mcols(windowsx)$name == rownames(countSummary)[y])
. wx <- windowsx[idx]
. wo <- GRanges(mcols(wx)$wSeq, ranges = IRanges(mcols(wx)$wStart,
. mcols(wx)$wEnd))[1, ]
. mcols(wo)$name <- mcols(wx)$name[1]
. mcols(wo)$effectiveLength <- sum(width(wx))
. mcols(wo)$percentEffectiveLength <- 100 * sum(width(wx))/width(wo)
. mcols(wo)$GC <- sum(mcols(wx)$GC * width(wx))/width(wo)
. mcols(wo)$AT <- sum(mcols(wx)$AT * width(wx))/width(wo)
. mcols(wo)$N <- sum(mcols(wx)$N * width(wx))/width(wo)
. countSummary[y, ] <- Matrix::colSums(mat[idx, , drop = FALSE])/sum(width(wx))
. windowSummary <- c(windowSummary, wo)
. }
. seqlevels(windowSummary) <- uniqueChr
. SummarizedExperiment::SummarizedExperiment(assays = SimpleList(counts = countSummary),
. rowRanges = windowSummary)
. }, threads = 1) %>% Reduce("rbind", .)
. .messageDiffTime("Filtering Low Quality Windows", tstart)
. seWindows <- seWindows[which(mcols(seWindows)$N < 0.001)]
. seWindows <- seWindows[rowData(seWindows)$percentEffectiveLength >=
. 90]
. .messageDiffTime("Normalizing GC-Bias", tstart)
. k <- 25
. seWindows <- seWindows[order(mcols(seWindows)$GC)]
. assays(seWindows)$log2GCNorm <- log2((assays(seWindows)$counts +
. 1e-05)/(apply(assays(seWindows)$counts, 2, function(x) .centerRollMean(x,
. k)) + 1e-05))
. if (normByNeighbors) {
. .messageDiffTime("Normalizing Coverage-Bias using Neighbors",
. tstart)
. totalCounts <- colSums(assays(seWindows)$counts * rowData(seWindows)$effectiveLength)
. seWindows <- seWindows[, order(totalCounts)]
. assays(seWindows)$log2GCNorm <- assays(seWindows)$log2GCNorm -
. t(apply(assays(seWindows)$log2GCNorm, 1, function(y) .centerRollMean(y,
. floor(0.025 * ncol(seWindows)))))
. seWindows <- seWindows[, cellNames]
. }
. .messageDiffTime("Smoothing CNV Scores", tstart)
. seWindows <- sort(sortSeqlevels(seWindows))
. seWindows <- lapply(uniqueChr, function(x) {
. seWindowsx <- seWindows[BiocGenerics::which(seqnames(seWindows) %bcin%
. x)]
. assays(seWindowsx)$log2GCSmooth <- apply(assays(seWindowsx)$log2GCNorm,
. 2, function(y) .centerRollMean(y, ceiling((windowSize/stepSize)/2)))
. seWindowsx
. }) %>% Reduce("rbind", .)
. windows <- lapply(split(rowRanges(seWindows), seqnames(seWindows)),
. function(x) {
. mcols(x)$idx <- seq_along(x)
. x
. }) %>% Reduce("c", .) %>% sortSeqlevels %>% sort
. rowRanges(seWindows) <- windows[rownames(seWindows)]
. .messageDiffTime("Adding to Arrow Files", tstart)
. dfParams <- data.frame(windowSize = windowSize, stepSize = stepSize,
. excludeChr = excludeChr, stringsAsFactors = FALSE)
. featureDF <- data.frame(seqnames = paste0(seqnames(seWindows)),
. idx = mcols(seWindows)$idx, start = start(seWindows),
. end = end(seWindows), name = mcols(seWindows)$name, GC = mcols(seWindows)$GC,
. effectiveLength = mcols(seWindows)$effectiveLength, stringsAsFactors = FALSE)
. o <- .initializeMat(ArrowFile = ArrowFile, Group = "CNVMatrix",
. Class = "Double", cellNames = cellNames, params = dfParams,
. featureDF = featureDF, force = force)
. uniqueChr <- unique(paste0(seqnames(seWindows)))
. seWindows <- seWindows[, cellNames]
. for (x in seq_along(uniqueChr)) {
. o <- .addMatToArrow(mat = Matrix::Matrix(assays(seWindows[BiocGenerics::which(seqnames(seWindows) ==
. uniqueChr[x])])$log2GCSmooth, sparse = TRUE), ArrowFile = ArrowFile,
. Group = paste0("CNVMatrix/", uniqueChr[x]), binarize = FALSE)
. gc()
. }
. return(ArrowFile)
. }, X = 1L, ArrowFiles = c(core = "/hpc/pmc_stunnenberg/cruiz/scATAC/analysis/dipg_ascites/notebooks/primary_tumor/archr/Save-projdipg/ArrowFiles/core.arrow"),
. allCells = c("core#CACCTGTTCTATACCT-1", "core#TGCATGACAACGGACA-1",
. "core#GTCACGGCAGAATGCG-1", "core#AGGCCTGTCGCGATGC-1", "core#CTTCTAATCGTGTGCG-1",
. .......
"core#GTGCCAGGTCTCTAAG-1")))), elementType = "ANY",
. elementMetadata = NULL, metadata = list()))))
16. h(simpleError(msg, call))
My session info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /hpc/pmc_stunnenberg/cruiz/miniconda3/envs/r_env_4.0.3/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg38_1.4.3 BSgenome_1.58.0
[3] rtracklayer_1.50.0 Biostrings_2.58.0
[5] XVector_0.30.0 Seurat_3.9.9.9023
[7] ArchR_1.0.1 magrittr_2.0.1
[9] rhdf5_2.34.0 Matrix_1.2-18
[11] data.table_1.13.4 SummarizedExperiment_1.20.0
[13] Biobase_2.50.0 GenomicRanges_1.42.0
[15] GenomeInfoDb_1.26.2 IRanges_2.24.1
[17] S4Vectors_0.28.1 BiocGenerics_0.36.0
[19] MatrixGenerics_1.2.0 matrixStats_0.57.0
[21] ggplot2_3.3.2.9000
loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-0 deldir_0.2-3
[4] ellipsis_0.3.1 ggridges_0.5.2 IRdisplay_0.7.0
[7] base64enc_0.1-3 spatstat.data_1.7-0 leiden_0.3.6
[10] listenv_0.8.0 ggrepel_0.9.0 codetools_0.2-18
[13] splines_4.0.3 polyclip_1.10-0 IRkernel_1.1.1
[16] jsonlite_1.7.2 Rsamtools_2.6.0 ica_1.0-2
[19] cluster_2.1.0 png_0.1-7 uwot_0.1.10
[22] shiny_1.5.0 sctransform_0.3.2.9000 compiler_4.0.3
[25] httr_1.4.2 fastmap_1.0.1 lazyeval_0.2.2
[28] later_1.1.0.1 htmltools_0.5.0 tools_4.0.3
[31] rsvd_1.0.3 igraph_1.2.6 gtable_0.3.0
[34] glue_1.4.2 GenomeInfoDbData_1.2.4 RANN_2.6.1
[37] reshape2_1.4.4 dplyr_1.0.2 Rcpp_1.0.5
[40] spatstat_1.64-1 scattermore_0.7 vctrs_0.3.6
[43] rhdf5filters_1.2.0 nlme_3.1-150 lmtest_0.9-38
[46] stringr_1.4.0 globals_0.14.0 mime_0.9
[49] miniUI_0.1.1.1 lifecycle_0.2.0 irlba_2.3.3
[52] XML_3.99-0.5 goftest_1.2-2 future_1.21.0
[55] zlibbioc_1.36.0 MASS_7.3-53 zoo_1.8-8
[58] scales_1.1.1 promises_1.1.1 spatstat.utils_1.17-0
[61] RColorBrewer_1.1-2 reticulate_1.18 pbapply_1.4-3
[64] gridExtra_2.3 rpart_4.1-15 stringi_1.5.3
[67] BiocParallel_1.24.1 repr_1.1.0 rlang_0.4.9
[70] pkgconfig_2.0.3 bitops_1.0-6 evaluate_0.14
[73] lattice_0.20-41 tensor_1.5 ROCR_1.0-11
[76] purrr_0.3.4 Rhdf5lib_1.12.0 GenomicAlignments_1.26.0
[79] patchwork_1.1.1 htmlwidgets_1.5.3 cowplot_1.1.0
[82] tidyselect_1.1.0 parallelly_1.22.0 RcppAnnoy_0.0.18
[85] plyr_1.8.6 R6_2.5.0 generics_0.1.0
[88] pbdZMQ_0.3-3.1 DelayedArray_0.16.0 mgcv_1.8-33
[91] pillar_1.4.7 withr_2.3.0 fitdistrplus_1.1-3
[94] abind_1.4-5 survival_3.2-7 RCurl_1.98-1.2
[97] tibble_3.0.4 future.apply_1.6.0 crayon_1.3.4
[100] uuid_0.1-4 KernSmooth_2.23-18 plotly_4.9.2.2
[103] grid_4.0.3 digest_0.6.27 xtable_1.8-4
[106] tidyr_1.1.2 httpuv_1.5.4 munsell_0.5.0
[109] viridisLite_0.3.0
What do you think the problem might be? I apologize in advance to bother you with it since this isn't a function inside ArchR
Thanks in advance for your help!
Hi,
I'm a new ArchR user. I have the same problem, have you solved it? I would like to ask if you have any suggestions.
Thanks! Tina
Hi, I'm an ArchR user that was directed to this repository https://github.com/GreenleafLab/10x-scATAC-2019 to perform CNV Analysis since the feature is not yet officially available in ArchR.
I posted an issue there, but realized since that project is dormant it may be wiser to post it here. First, is CNV analysis available on ArchR developmentally?
If not, I have a few questions about how this repository works. I see that the work is transferred from step to step via RDS files, so since CNV analysis is the 8th step, I will have to do some number of steps before. Which are necessary?
Another question I have is how to input multiple fragment files. I noticed that a list of vector paths, like in ArchR, doesn't work on the first step, though a single file path does. Is there a way to do this with multiple files?
Thanks!
Hugo Kitano