GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

addGeneExpressionMatrix error with a new species #1286

Closed Goultard59 closed 2 years ago

Goultard59 commented 2 years ago

Describe the bug I try to add an expression matrix from 10x single cell multiome analysis following the step from this tutorial : https://www.hdsu.org/sincellTE_2022/session2/01_multiome_ArchR.html with some adaptation for my species (pig sus scrofa), Cell ranger have been performed with pig 11.1 genome sequence and a custom genome annotation.

To Reproduce

library(ArchR)
library(TxDb.Sscrofa.UCSC.susScr11.refGene)
library(org.Ss.eg.db)
library(BSgenome.Sscrofa.UCSC.susScr11)

txdb <- TxDb.Sscrofa.UCSC.susScr11.refGene
orgdb <- org.Ss.eg.db

geneAnnotation <- createGeneAnnotation(TxDb = txdb, OrgDb = orgdb)
genomeAnnotation <- createGenomeAnnotation(BSgenome.Sscrofa.UCSC.susScr11)

addArchRThreads(10)

work_dir <- paste0("/home/adufour/work/archr_j7")
data_dir <- "/home/adufour/work/workspace/cellranger_arc/count/J7o_1.Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.2021_12_09"
dir.create(work_dir, recursive = TRUE)
setwd(work_dir)

# Create Arrow file
createArrowFiles(inputFiles  = "/home/adufour/work/workspace/cellranger_arc/count/J7o_1.Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.2021_12_09/J7o_1/outs/atac_fragments.tsv.gz", 
                 sampleNames = "J7", 
                 QCDir       = "QualityControl",
                 logFile     = createLogFile(name = "createArrows", 
                                             logDir = "ArchRLogs"),
                 geneAnnotation = geneAnnotation,
                 genomeAnnotation = genomeAnnotation,
                 force       = TRUE)

archrproj <- ArchRProject(ArrowFiles = paste0(work_dir, "/J7.arrow"),
                geneAnnotation = geneAnnotation,
                genomeAnnotation = genomeAnnotation)

scRNA <- import10xFeatureMatrix(
    input = "/home/adufour/work/workspace/cellranger_arc/count/J7o_1.Sus_scrofa.11.1.102.quant3p_extended.updated.sorted.2021_12_09/J7o_1/outs/filtered_feature_bc_matrix.h5",
    names = "J7")
scRNA

archrproj <- addGeneExpressionMatrix(input = archrproj, seRNA = scRNA, force = TRUE)

Session Info ArchR-addGeneExpressionMatrix-23801808c203-Date-2022-02-10_Time-14-23-14.log

Additional context Error message

ArchR logging to : ArchRLogs/ArchR-addGeneExpressionMatrix-23801808c203-Date-2022-02-10_Time-14-23-14.log
If there is an issue, please report to github with logFile!

Overlap w/ scATAC = 0.982

2022-02-10 14:23:15 : 

Overlap Per Sample w/ scATAC : J7=1151

2022-02-10 14:23:15 : 

2022-02-10 14:23:15 : Batch Execution w/ safelapply!, 0 mins elapsed.

Error in .initializeMat(ArrowFile = ArrowFile, Group = matrixName, Class = "double", : all(c("seqnames", "idx") %in% colnames(featureDF)) n'est pas TRUE
Traceback:
rcorces commented 2 years ago

Hi @Goultard59! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now.

Goultard59 commented 2 years ago

Hi @Goultard59! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues. Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now.

  1. Yes but i didn't find my answer
  2. I didn't encountered error using the tutorial human dataset
rcorces commented 2 years ago

I'm not 100% sure. It looks like something is not properly formatted with your seRNA object. It seems like your featureDF is empty, which is created here: https://github.com/GreenleafLab/ArchR/blob/968e4421ce7187a8ac7ea1cf6077412126876d5f/R/MatrixGeneExpression.R#L205

This is likely unrelated to your error but your scRNA object looks pretty odd. It only has information for 6 genes.

addGeneExpressionMatrix Input-Parameters$seRNA: length = 21288
class: RangedSummarizedExperiment 
dim: 6 2397 
metadata(0):
assays(1): counts
rownames(6): PSMB1 DLL1 ... PHF10 ENSSSCG00000004008
rowData names(5): feature_type genome id interval name
colnames(2397): J7#AAACAGCCAAGCTACC-1 J7#AAACAGCCACAGCCTG-1 ...
  J7#TTTGTCTAGGTTAGCT-1 J7#TTTGTTGGTGCAACTA-1
colData names(0):
Goultard59 commented 2 years ago

Hello,

The function import10xFeatureMatrix seems to import the scRNA object correctly.

class: RangedSummarizedExperiment 
dim: 21288 2397
metadata(0):
assays(1): counts
rownames(21288): ENSSSCG00000033408 ENSSSCG00000044554 ...
  ENSSSCG00000034846 ENSSSCG00000049924
rowData names(5): feature_type genome id interval name
colnames(2397): J7#AAACAGCCAAGCTACC-1 J7#AAACAGCCACAGCCTG-1 ...
  J7#TTTGTCTAGGTTAGCT-1 J7#TTTGTTGGTGCAACTA-1
colData names(0):

I also tested to re-run cellranger with an unmodified 104 pig ENSEMBL annotations which didn't have any effect.

kenxie7 commented 2 years ago

Hi!

I had a similar problem as mentioned above. After going through the code and debugging (ArchR/R/MatrixGeneExpression.R Lines 88-109), I found that one potential problem is that the chromosome names do not match from the ATAC data and the seRNA object (e.g., "chr1" vs "1")

I checked with seqnames(seRNA), and the chromosome names were numeric (i.e. "1"). I resolved the problem with the following code to modify the seRNA object (to match the name from ATAC arrow files) and worked around it.

# The new version (1.0.2) returns a list object for seRNA using the import10xFeatureMatrix function

seRNA2 <- seRNA[[1]]
gr_df <- data.frame(rowRanges(seRNA2))
gr_df$seqnames = paste0("chr", gr_df$seqnames)
#head(gr_df)
rowRanges(seRNA2) <- makeGRangesFromDataFrame(gr_df)
rownames(seRNA2) <- rownames(seRNA[[1]])

archrproj <- addGeneExpressionMatrix(input = archrproj, seRNA = seRNA2, force = TRUE)

Not sure if it helps, but hope you will solve your problem soon!

Goultard59 commented 2 years ago

Hi,

I have adapted our solution, it solve the problems but i still have trouble in script execution.

scRNA@rowRanges@elementMetadata@listData$id <- scRNA@rowRanges@elementMetadata@listData$name
scRNA@rowRanges@seqnames@values <- paste0("chr", scRNA@rowRanges@seqnames@values)
scRNA@rowRanges@seqnames@values <- factor(scRNA@rowRanges@seqnames@values)
scRNA@rowRanges@seqinfo@seqnames <- paste0("chr", scRNA@rowRanges@seqinfo@seqnames)

Solve the execution error ArchR-addGeneExpressionMatrix-1e843087fb4e-Date-2022-03-15_Time-11-52-03.log

Now, i get a new error following tutorials

archrproj <- addIterativeLSI(
    ArchRProj = archrproj, 
    clusterParams = list(
      resolution = 0.2, 
      sampleCells = 5000,
      n.start = 10
    ),
    saveIterations = FALSE,
    useMatrix = "GeneExpressionMatrix", 
    depthCol = "Gex_nUMI",
    varFeatures = 2500,
    firstSelection = "variable",
    binarize = FALSE,
    name = "LSI_RNA"
)

ArchR-addIterativeLSI-1e8478836a13-Date-2022-03-15_Time-13-01-28.log

Thanks for your helps.

rcorces commented 2 years ago

@Goultard59 can you upgrade to ArchR release_1.0.2 and re-try. Looks like you are hitting this same issue: https://github.com/GreenleafLab/ArchR/issues/638

devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.2", repos = BiocManager::repositories())

Goultard59 commented 2 years ago

@rcorces

I'm still getting the same error after upgrading.

ArchR-addIterativeLSI-53b86a5f624e-Date-2022-03-22_Time-12-26-52.log

rcorces commented 2 years ago

@Goultard59 - I'm not able to download your log file. can you re-upload it?

Goultard59 commented 2 years ago

ArchR-addIterativeLSI-53b86a5f624e-Date-2022-03-22_Time-12-26-52.log

rcorces commented 2 years ago

have you subsetted this project? If so, did you do this using the subsetArchRProject() function or did you do this using ArchRProj[someCellNames]? If you subsetted without using subsetArchRProject() then that is likely the problem.

Goultard59 commented 2 years ago

I have reset my archrproject and delete this line archrproj <- archrproj[archrproj$TSSEnrichment > 6 & archrproj$nFrags > 2500]

But i'm still getting the same error message ArchR-addIterativeLSI-1e7e7d413598-Date-2022-03-23_Time-16-48-57.log ArchR_J7.r.txt

Thanks for your help

rcorces commented 2 years ago

Its hard for me to provide more help without knowing everything that you're doing.

The error you are getting is coming from here:

    if(!all(cellNames %in% colnames(mat))){
      .logThis(cellNames, "cellNames supplied", logFile = logFile)
      .logThis(colnames(mat), "cellNames from matrix", logFile = logFile)
      stop("Error not all cellNames found in partialMatrix")
    }

This basically is saying that the cellNames of your ArchRProject are not all present in the column names of your matrix. I guess in your case, maybe this is because you are using multiome data and you have some cells that had ATAC but dont have RNA?

Goultard59 commented 2 years ago

Hi,

It was the case, i have solved this issues with

archrproj <- subsetArchRProject(archrproj, intersect(ArchR::getCellNames(archrproj), colnames(scRNA)), force = TRUE)

Thanks for your helps Adrien.

rcorces commented 2 years ago

Thanks for confirming. I'm going to leave this open because I think we need to solve the underlying issue because this is bound to come up in the future.

rcorces commented 2 years ago

I've patched this in release_1.0.2. I'm still uncertain what the best way to handle this is. Currently, the solution is to warn users during addGeneExpressionMatrix() that not all cells match. https://github.com/GreenleafLab/ArchR/commit/b40e6b39e04cd3391642631631823f7f02f09764

rcorces commented 2 years ago

I believe this issue is solved.

Additionally, the mismatching seqnames mentioned in https://github.com/GreenleafLab/ArchR/issues/1286#issuecomment-1055578417 has been addressed via https://github.com/GreenleafLab/ArchR/commit/c196662744a4e08ae226c9a66d0a5ceb32a18738 on the dev branch and will be incorporated into release_1.0.3

XiaoKaixuan12333 commented 10 months ago

Hi,

I have adapted our solution, it solve the problems but i still have trouble in script execution.

scRNA@rowRanges@elementMetadata@listData$id <- scRNA@rowRanges@elementMetadata@listData$name
scRNA@rowRanges@seqnames@values <- paste0("chr", scRNA@rowRanges@seqnames@values)
scRNA@rowRanges@seqnames@values <- factor(scRNA@rowRanges@seqnames@values)
scRNA@rowRanges@seqinfo@seqnames <- paste0("chr", scRNA@rowRanges@seqinfo@seqnames)

Solve the execution error ArchR-addGeneExpressionMatrix-1e843087fb4e-Date-2022-03-15_Time-11-52-03.log

Now, i get a new error following tutorials

archrproj <- addIterativeLSI(
    ArchRProj = archrproj, 
    clusterParams = list(
      resolution = 0.2, 
      sampleCells = 5000,
      n.start = 10
    ),
    saveIterations = FALSE,
    useMatrix = "GeneExpressionMatrix", 
    depthCol = "Gex_nUMI",
    varFeatures = 2500,
    firstSelection = "variable",
    binarize = FALSE,
    name = "LSI_RNA"
)

ArchR-addIterativeLSI-1e8478836a13-Date-2022-03-15_Time-13-01-28.log

Thanks for your helps.

hi,I am sorroy to open this question again, i also meet the same question, but I check my scRNA, it seems no scRNA@rowRanges@elementMetadata@listData$name and scRNA@rowRanges@seqnames@values. scRNA only have the information of gene and cell, no any chromosome. my scRNA is covert from the seurat RNA assays. I don't know what is wrong with this.