Closed kkobayashikirschvink closed 1 year ago
This is the expected behavior and I'm not sure how to fix this. This happens at this line of code: https://github.com/GreenleafLab/ArchR/blob/2f022a448d8248a0f9afb33419bcbaeafe7731c0/R/MultiModal.R#L89
The feature matrix in the hdf5 file (accessed via features <- h5read(featureMatrix, "/matrix/features")
) has a column called "interval" which gives chromosome coordinates. In the 10x multiome example datasets, the interval column is all "NA" for the mitochondrial genes and thus these genes are excluded from the SummarizedExperiment.
> rowData(se)[grep(pattern = "MT-", x = rownames(rowData(se))),]
DataFrame with 13 rows and 5 columns
feature_type genome id interval name
<Rle> <Rle> <array> <array> <array>
MT-ND1 Gene Expression GRCh38 ENSG00000198888 NA MT-ND1
MT-ND2 Gene Expression GRCh38 ENSG00000198763 NA MT-ND2
MT-CO1 Gene Expression GRCh38 ENSG00000198804 NA MT-CO1
MT-CO2 Gene Expression GRCh38 ENSG00000198712 NA MT-CO2
MT-ATP8 Gene Expression GRCh38 ENSG00000228253 NA MT-ATP8
... ... ... ... ... ...
MT-ND4L Gene Expression GRCh38 ENSG00000212907 NA MT-ND4L
MT-ND4 Gene Expression GRCh38 ENSG00000198886 NA MT-ND4
MT-ND5 Gene Expression GRCh38 ENSG00000198786 NA MT-ND5
MT-ND6 Gene Expression GRCh38 ENSG00000198695 NA MT-ND6
MT-CYB Gene Expression GRCh38 ENSG00000198727 NA MT-CYB
I'm not sure why these genes dont have annotated intervals (that seems like a 10x problem?). Presumably you could try to manually edit this yourself.
Anything I'm missing @jgranja24 ?
Hi Ryan,
Thank you for the reply!
I wanted to filter some cells based on their mitochondrial gene counts. What do you think is the easiest way to do this? For example, can I import other data formats to ArchR such as h5ad or Seurat files?
For your information, I contacted 10x about this, and it looks like they don't include chrM gene coordinates as they exclude them for peak calling in cellranger.
In the ARC pipeline, we consider chrM as the non_nuclear_contigs (https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/release-notes/references#GRCh38-2020-A):
non_nuclear_contigs: [\"chrM\"]
Here are some more detail about the 'non_nuclear_contigs' (https://support.10xgenomics.com/single-cell-multiome-atac-gex/software/pipelines/latest/advanced/references):
non_nuclear_contigs (Optional; list of strings) name(s) of contig(s) that do not have any chromatin structure, for example, mitochondria or plastids. For the GRCh38 assembly this would be ["chrM"]. These contigs are excluded from peak calling since the entire contig will be "open" due to a lack of chromatin structure.
As you can see, chrM is excluded from peak calling. Given that you are working with GEX+ATAC joint analysis, we did not include the full information of genes in chrM in GEX results as well.
Thanks for posting the information from 10x. I think the most straightforward way to do this would be to use the RNA data alone to identify barcodes that you want to remove based on MT- genes and then remove these cells from the ArchR project using subsetting. This will require some manual work on your end and is not part of a standard ArchR workflow at the moment but we will take this into account as this is a common filtering step in scRNA-seq
Hi Ryan,
Thanks, that would be fantastic. For now I’ll subset cells based on barcodes as you suggested.
Better late than never. We've done a big overhaul to the multiomic import functions and now make it possible to retain chrM genes. This is currently available on dev
and will be incorporated into the next stable release. To do this, you need to tell ArchR what the gene intervals are for these genes via the features
argument. This is because CellRanger doesnt provide those gene positions.
Thank you so much! This is fantastic
Dears, I'm working on multiome data (RNA+ATAC). I want to remove MT genes from RNA assay, so
genes.use <- grep(pattern = "^MT-", rownames(seurat),value=TRUE, invert=T) #get list of non-ribosomal genes
seurat<- subset(seurat, features = C024genes.use)
However, after subset function, the new seurat object contain only RNA assay, the ATAC assay is not present How i can solve this?
thanks in advance
Giuseppe
Hi Jeffrey and Ryan,
I was testing the multiome pipeline following the tutorial below. https://greenleaflab.github.io/ArchR_2020/Ex-Analyze-Multiome.html
However, I found out that at the import10xFeatureMatrix function, all the mitochondrial genes were removed.
Is this the default behavior? I have confirmed that they exist in the cellranger output like below.
Thanks!