constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

table of droplets (tod) and table of counts (toc) have different numbers of genes #148

Open al3xmlt030 opened 4 months ago

al3xmlt030 commented 4 months ago

Hi, I've encountered an issue after running the cellranger multi pipeline for scRNASeq with a fixed protocol. The pipeline has generated a directory structure with "per_sample_outs", wherein each sample contains a "count" folder with both "filtered_feature_bc_matrix" and "raw_feature_bc_matrix". However, I've noticed a discrepancy in the number of genes between these two matrices, leading to the following problem:

toc = Seurat::Read10X(file.path(tmpDir, "filtered_feature_bc_matrix")) tod = Seurat::Read10X(file.path(tmpDir, "raw_feature_bc_matrix")) sc = SoupChannel(tod, toc)

Error in SoupChannel(tod, toc): The provided table of droplets (tod) and table of counts (toc) have different numbers of genes. Both tod and toc must have the same genes in the same order. Traceback:

Thanks a lot in advance!

jcorn427 commented 4 months ago

I've run into the same issue. I initially tried to use the load10X() function but due to the different directory structures from regular cell ranger and cell ranger multi it doesn't work. So, then I tried what you've done here to build the soupchannel object from the raw matrices and I get the same error as you.

Edit: Just wanted to add that I was hoping it would work since you're using cell ranger multi to generate singleplex data from the fixed samples. However, I don't know if support for cell ranger multi will be added.

changostraw commented 4 months ago

I get this error as well trying to apply SoupX to cellranger multi outs. Does anyone know a workaround?

gyanmishra commented 3 months ago

I am having the similar issue. Does anyone got any fix for this.

h5.files = list.files("results/20240304D3A_Seur_R/",pattern = "*.h5",full.names = TRUE)
raw.matrix.files = h5.files[grepl('_raw',h5.files)]
filt.matrix_files =  h5.files[!(grepl('_raw',h5.files))]

raw.matrix <- lapply(raw.matrix.files,
                      function(x){
                        Read10X_h5(x,use.names = F)})

filt.matrix  <- lapply(filt.matrix_files, 
                      function(x){
                        Read10X_h5(x,use.names = F)})

soup.channel  <- for(i in 1:length(raw.matrix)){SoupChannel(raw.matrix[i], filt.matrix[i])}

Error in if (nrow(tod) != nrow(toc)) stop("The provided table of droplets (tod) and table of counts (toc) have different numbers of genes. Both tod and toc must have the same genes in the same order.") : argument is of length zero

wblashka commented 3 months ago

I'm having a similar issue. I won't have time to try and troubleshoot myself, but I am wondering if this is the result of Cellranger automatically filtering out deprecated probes from their FRP protocol. Based on the description of Cellranger multi's outputs, it seems like the raw matrix includes these probes while the filtered matrix does not. Perhaps these are responsible for the discrepancy? If anyone is able to attempt to remove these probes from a raw matrix and see if that resolves the issue, I would love to know... otherwise I will attempt this in a couple of weeks.

NathanKochhar commented 3 months ago

if your object has multiple assays this will fix it:

toc <- Read10X(data.dir = "/filtered_feature_bc_matrix")
tod <- Read10X(data.dir = "/raw_feature_bc_matrix")
toc <- toc$"Gene Expression"
tod <- tod$"Gene Expression" 
sc = SoupChannel(tod, toc, calcSoupProfile = FALSE)
sc = estimateSoup(sc)
aspides-js commented 3 months ago

@wblashka following your suggestion I filtered the raw matrix to only include the probes marked as included = TRUE in cellranger's probe_set.csv output but unfortunately doesn't resolve the issue - in my case this only filtered out 419 of the 13285 gene discrepancy. As a workaround, the function works after simply filtering the raw matrix by setdiff() on the rownames between the raw and filtered matrices.

RB786 commented 2 months ago

I faced similar issue. My raw and filtered hd5 files have different number of genes. I filtered the unmatched genes between the two files and then it worked. However I am not sure if this is the right way. Has anyone got it solved?

jnmnbals commented 2 months ago

Adding myself to the list of users running into this issue. Hoping someone has found a workaround or two for this.

I faced similar issue. My raw and filtered hd5 files have different number of genes. I filtered the unmatched genes between the two files and then it worked. However I am not sure if this is the right way. Has anyone got it solved?

@RB786 Would you mind sharing how you went about filtering? Still very new to the bioinformatics world.

imet-k commented 1 month ago

I solved it like this if anyone is interested: (filt.matrix is toc and raw.matrix tod)

filt_genes <- rownames(filt.matrix)

# Subset raw.matrix to keep only the genes in filt.matrix
raw.matrix_subset <- raw.matrix[rownames(raw.matrix) %in% filt_genes, ]
afletch00 commented 1 month ago

Adding myself as well. @imet-k, your method worked than you!!! I am wondering if there have been other issues pop-up when using the FLEX assay.