buenrostrolab / FigR

Functional Inference of Gene Regulation
https://buenrostrolab.github.io/FigR/
MIT License
31 stars 10 forks source link

Error when running runGenePeakcorr #39

Open klgoss opened 6 months ago

klgoss commented 6 months ago

Hello,

I am very excited to try this tool out! I just installed FigR and am trying it out on my 10x multiome dataset. I used a helper function I found here on Github to get my data into the correct format. When I try to run runGenePeakcorr, I get the following error:
"Assuming paired scATAC/scRNA-seq data .. Matrix object input detectedCentering counts for cells sequentially in groups of size 1000 ..

Computing centered counts for cells: 1 to 1000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 1001 to 2000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 2001 to 3000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 3001 to 4000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 4001 to 5000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 5001 to 6000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 6001 to 7000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 7001 to 8000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 8001 to 9000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 9001 to 10000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 10001 to 11000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 11001 to 12000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 12001 to 13000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 13001 to 13057 .. Computing centered counts per cell using mean reads in features ..

Merging results.. Done! Peaks with 0 accessibility across cells exist .. Removing these peaks prior to running correlations .. Important: peak indices in returned gene-peak maps are relative to original input SE Number of peaks in ATAC data: 999062 Number of genes in RNA data: 18362

Num genes overlapping TSS annotation and RNA matrix being considered: 13920

Taking peak summits from peak windows .. Finding overlapping peak-gene pairs .. Found 931735 total gene-peak pairs for given TSS window .. Number of peak summits that overlap any gene TSS window: 541706 Number of gene TSS windows that overlap any peak summit: 13838

Determining background peaks .. Using 100 iterations ..

Error in chol.default(cov(norm_mat)) : the leading minor of order 2 is not positive"

Here is my full code:



# Function definition
SEfromSignac <- function(obj, # Seurat object containing assay that houses peak (ATAC) info
                         assayName="peaks", # Assay name for peak (ATAC) info
                         fetchRawCounts=TRUE){ # Whether to use raw counts (Default), otherwise fetch normalized counts if present 
  message("Pulling info from assay container: ",assayName,"\n")

  peakRanges <- obj@assays[[assayName]]@ranges

  if(fetchRawCounts){
    message("Using raw peak counts to store in SE ..\n")
    peakCounts <- obj@assays[[assayName]]@counts
  } else {
    message("WARNING: Pulling from normalized peak count slot.\n If running FigR's gene-peak association testing, make sure you set normalizeATACmat to FALSE in the runGenePeakcorr and getDORCscores functions to avoid renormalizing data internally")
    peakCounts <- obj@assays[[assayName]]@data
  }

  cellMeta <- DataFrame(obj@meta.data)

  SE <- SummarizedExperiment::SummarizedExperiment(assays = list(counts=peakCounts),
                                                   rowRanges = peakRanges,
                                                   colData = cellMeta)
  return(SE)
}

# Function call (on example data)
# Test run on mini peakset (built into Signac)
ATAC.se <- SEfromSignac(obj=day2,assayName = "peaks",fetchRawCounts = TRUE)
ATAC.se

rnaMat <- day2@assays$SCT@data
rnaMat <- rnaMat[Matrix::rowSums(rnaMat)!=0,]

cisCor <- runGenePeakcorr(ATAC.se = ATAC.se,
                          RNAmat = rnaMat,
                          genome = "hg38",
                          nCores = 4, 
                          p.cut=NULL)```

Any insight would be greatly appreciated!!
vkartha commented 6 months ago

Hi there, sorry for the late reply - inspecting this, we haven't come across that error before so will do our best to diagnose the issue here (thanks for sharing your code). It could be a memory issue since the initial part started running no problem.

On first glance, I noticed you have ~1M accessibility peaks - can I ask how this was derived,since that seems like a lot? On that note, if possible (just for testing purposes), can you subset your peak matrix to just the first 10,000 peaks (or something) and see if you get that same error?

klgoss commented 6 months ago

No problem @vkartha , thanks for the reply. When revisiting this and running the code, I'm actually not getting that error but a different one now.

"Assuming paired scATAC/scRNA-seq data .. Matrix object input detectedCentering counts for cells sequentially in groups of size 1000 ..

Computing centered counts for cells: 1 to 1000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 1001 to 2000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 2001 to 3000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 3001 to 4000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 4001 to 5000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 5001 to 6000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 6001 to 7000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 7001 to 8000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 8001 to 9000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 9001 to 10000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 10001 to 11000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 11001 to 12000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 12001 to 13000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 13001 to 13057 .. Computing centered counts per cell using mean reads in features ..

Merging results.. Done! Error in .Call("CRsparse_rowSums", x, na.rm, FALSE, sparseResult) : "CRsparse_rowSums" not resolved from current namespace (Matrix)"

Not sure if this is an issue with the Matrix package or something else. To answer your question on the number of peaks, I had created a shared peak set from 12 different multimode objects following the Signac tutorial https://stuartlab.org/signac/articles/merging#:~:text=Creating%20a%20common%20peak%20set,prior%20to%20merging%20the%20objects . I used the "disjoin" function instead of reduce so that we would have several smaller peaks instead of large chunks.

klgoss commented 6 months ago

Hi,

I actually figured out that error was due to a package version issue and was able to fix that. Now I'm getting the original error in my first comment.

vkartha commented 5 months ago

Hi there, thanks for figuring that part out - are you able to do what I had suggested earlier just to test, i.e. restrict yourself to the first 1/10k peaks and see if it runs fully ? (in case it is a memory-related issue)? Apologies as I haven't come across this before and it is a bit hard to debug with a generic error message like that that seems internal (from another source, not our wrapper functionality)