GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
387 stars 140 forks source link

addGeneExpressionMatrix incorrectly calculates "Gex_MitoRatio" #2139

Open dannyconrad opened 8 months ago

dannyconrad commented 8 months ago

This is a quick fix but users should know that when doing any kind of multiome analysis, addGeneExpressionMatrix() is not accurately calculating Gex_MitoRatio. There are two sources of error here that I can see:

  1. When import10xFeatureMatrix() reads the .h5 file via .importFM(), it removes any entry in the features dataframe for which the interval (i.e. chr1:10000-20000) value is missing. In my own cellranger-arc outputs, this value is only missing for the mitochondrial genes, so they are removed when loading the feature matrix. It only performs this check if the interval column exists at all, so this doesn't seem to apply to standard cellranger outputs because I don't think there's an "interval" slot in the .h5 files it produces.
  if ("interval" %in% colnames(rowData(se))) {
    idxNA <- which(rowData(se)$interval == "NA")
    if (length(idxNA) > 0) {
      se <- se[-idxNA, ]
    }
    rr <- GRanges(paste0(rowData(se)$interval))
    mcols(rr) <- rowData(se)
    se <- SummarizedExperiment(assays = SimpleList(counts = assay(se)), 
      rowRanges = rr)
  }
  1. Even if the mito genes are retained when loading the feature matrix, the resulting value is inflated. This is similar to the issue raised in #2000 by @Nahuck. In addGeneExpressionMatrix() the regex pattern ^MT is including almost 100 additional genes that begin with the letters "MT", including genes like MTOR and MT2A. Easy fix here would be to add the hyphen that delineates the mito genes in mouse and human gene annotations and make the pattern case insensitive: (?i)^mt-
  MitoRatio <- Matrix::colSums(assay(seRNA)[grep("^MT", rownames(assay(seRNA))), 
    ])/nUMI

Using ArchR version 1.0.2

rcorces commented 8 months ago

Hi @dannyconrad! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know! In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15. Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.