buenrostrolab / FigR

Functional Inference of Gene Regulation
https://buenrostrolab.github.io/FigR/
MIT License
32 stars 10 forks source link

Failure to execute runGenePeakcorr #18

Closed sylestiel closed 1 year ago

sylestiel commented 1 year ago

@vkartha @jdbuenrostro

cisCorr <- FigR::runGenePeakcorr(ATAC.se,

  • RNAmat,
  • genome = "mm10",
  • keepPosCorOnly = TRUE,
  • keepMultiMappingPeaks = TRUE,# One of hg19, mm10 or hg38
  • ) Assuming paired scATAC/scRNA-seq data .. Centering counts for cells sequentially in groups of size 1000 ..

Computing centered counts for cells: 1 to 1000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 1001 to 2000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 2001 to 3000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 3001 to 4000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 4001 to 5000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 5001 to 6000 .. Computing centered counts per cell using mean reads in features ..

Computing centered counts for cells: 6001 to 6410 .. Computing centered counts per cell using mean reads in features ..

Merging results.. Done! Genes with 0 expression across cells exist .. Removing these genes prior to running correlations .. Number of peaks in ATAC data: 165027 Number of genes in RNA data: 24549

Num genes overlapping TSS annotation and RNA matrix being considered: 20267

Taking peak summits from peak windows .. Finding overlapping peak-gene pairs .. Found 0 total gene-peak pairs for given TSS window .. Number of peak summits that overlap any gene TSS window: 0 Number of gene TSS windows that overlap any peak summit: 0

Determining background peaks .. Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘letterFrequency’ for signature ‘"DNAStringSetList"’ 7: stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) 6: (function (classes, fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) return(methods[[1L]]) else if (length(methods) == 0L) { cnames <- paste0("\"", vapply(classes, as.character, ""), "\"", collapse = ", ") stop(gettextf("unable to find an inherited method for function %s for signature %s", sQuote(fdef@generic), sQuote(cnames)), domain = NA) } else stop("Internal error in finding inherited methods; didn't return a unique method", domain = NA) })(list(structure("DNAStringSetList", package = "Biostrings")), new("standardGeneric", .Data = function (x, letters, OR = "|", as.prob = FALSE, ...) standardGeneric("letterFrequency"), generic = structure("letterFrequency", package = "Biostrings"), package = "Biostrings", group = list(), valueClass = character(0), signature = "x", default = NULL, skeleton = (function (x, letters, OR = "|", as.prob = FALSE, ...) ... 5: letterFrequency(seqs, c("A", "C", "G", "T")) 4: .local(object, ...) 3: chromVAR::addGCBias(ATAC.se, genome = myGenome) 2: chromVAR::addGCBias(ATAC.se, genome = myGenome) 1: FigR::runGenePeakcorr(ATAC.se, RNAmat, genome = "mm10", )

Can you help troubleshoot? Many thanks!

vkartha commented 1 year ago

Hi there!

I believe this is suspect:

Taking peak summits from peak windows .. Finding overlapping peak-gene pairs .. Found 0 total gene-peak pairs for given TSS window .. Number of peak summits that overlap any gene TSS window: 0 Number of gene TSS windows that overlap any peak summit: 0

Can you print the ranges associated with your input ATAC.se object:

rowRanges(ATAC.se)

My suspicion is that even though you are aiming to use mm10, your seqnames might not overlap anything in the built in TSSRanges mm10 object (e.g. if you don't have chr in any of your seqnames)

sylestiel commented 1 year ago

@vkartha

Hi!

You are correct. The output of rowRanges(ATAC.se) is 0.

Screen Shot 2023-01-17 at 10 05 55 AM

I'm feeding in the output of Signac 10X Multiome analysis subsequent to running SCT/PCA and TFIDF/SVD. Can you help me address this issue?

Thank you!

vkartha commented 1 year ago

If you want to feed in a Signac object, make sure you pull both the peak rowRanges and the counts appropriately. I've written a small function (below) to go from a Seurat object containing Signac ChromatinAssay info to a SummarizedExperiment object, which you should test run on your own Signac object.

It sounds like you're following the recommended workflow, which is load 10x counts into Signac and complete processing, including QC, filtering, clustering etc. Now you just need to fetch the resulting cell barcode peak counts into a SummarizedExperiment object which can be run with FigR.

# Function definition
SEfromSignac <- function(obj, # Seurat object containing assay that houses peak (ATAC) info
                         assayName="peaks", # Assay name for peak (ATAC) info
                         fetchRawCounts=TRUE){ # Whether to use raw counts (Default), otherwise fetch normalized counts if present 
  message("Pulling info from assay container: ",assayName,"\n")

  peakRanges <- obj@assays[[assayName]]@ranges

  if(fetchRawCounts){
    message("Using raw peak counts to store in SE ..\n")
    peakCounts <- obj@assays[[assayName]]@counts
  } else {
    message("WARNING: Pulling from normalized peak count slot.\n If running FigR's gene-peak association testing, make sure you set normalizeATACmat to FALSE in the runGenePeakcorr and getDORCscores functions to avoid renormalizing data internally")
  peakCounts <- obj@assays[[assayName]]@data
    }

  cellMeta <- DataFrame(obj@meta.data)

  SE <- SummarizedExperiment::SummarizedExperiment(assays = list(counts=peakCounts),
                                                   rowRanges = peakRanges,
                                                   colData = cellMeta)
  return(SE)
}

# Function call (on example data)
# Test run on mini peakset (built into Signac)
ATAC.se <- SEfromSignac(obj=Signac::atac_small,assayName = "peaks",fetchRawCounts = TRUE)`
ATAC.se

I would recommend using raw counts for FigR functions since we can internally normalize the peak count data ourselves. I have not tested FigR output on count data normalized from Signac (TF-IDF).

Please let me know if this works, I will likely include this helper function as part of the package since there are other related queries about how to start with 10x output and get to SummarizedExperiment objects that are properly formatted

sylestiel commented 1 year ago

It worked! Thanks.

As for RNA mat can I just load the SCT/PCA output?

vkartha commented 1 year ago

Correct, for the RNA you may use SCT/lognormalized counts from the Seurat object (assuming multi is the name of your object here:

rnaMat <- multi@assays$SCT@data
dim(rnaMat) # genes x cells
rnaMat[1:5,1:5] # Normalized

You can then use that as input to runGenePeakcorr