constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
249 stars 34 forks source link

Cell Ranger multi support #110

Closed gilstel closed 2 years ago

gilstel commented 2 years ago

load10X doesn't work for cell ranger multi output since it has a different directory structure. Filtered and raw counts are located in two different locations. I therefore used -

table.droplets.gex.hto.channel.1 = Read10X(data.dir = "/path/to/count/sample_feature_bc_matrix")
dim(table.droplets.gex.hto.channel.1$`Gene Expression`)

[1] 32285 8424

table.counts.gex.hto.channel.1 = Read10X(data.dir = "/path/to/count/raw_feature_bc_matrix")
dim(table.counts.gex.hto.channel.1$`Gene Expression`)

[1] 32285 701967

The matrices do not have the same dimensions (as expected)

soup.channel = SoupChannel(tod = table.droplets.gex.hto.channel.1$`Gene Expression`, toc = table.counts.gex.hto.channel.1$`Gene Expression`)
dim(soup.channel$toc)

[1] 32285 701967

I then created a meta.data data frame head(metadata.cell.ranger)

                     RD1       RD2 cluster

AAACCTGAGACGCAAC-1 -6.205565 -1.484691 1 AAACCTGAGAGCTTCT-1 5.701532 -4.535702 7 AAACCTGAGCCTTGAT-1 -6.160045 -2.895694 1 AAACCTGAGTCGTTTG-1 -3.067753 -1.458081 1 AAACCTGAGTGTACGG-1 5.203574 6.485120 6 AAACCTGCAAAGCGGT-1 4.044699 5.431659 1

soup.channel = setClusters(soup.channel, setNames(metadata.cell.ranger$cluster, rownames(metadata.cell.ranger)))

Error in setClusters(soup.channel, setNames(metadata.cell.ranger$cluster, : Invalid cluster specification. See help.

I also tried soup.channel = SoupChannel(tod = table.droplets.gex.hto.channel.1$Gene Expression, toc = table.counts.gex.hto.channel.1$Gene Expression, metaData = metadata.cell.ranger) Error in SoupChannel(tod = table.droplets.gex.hto.channel.1$Gene Expression, : Rownames of metaData must match column names of table of counts. In addition: Warning message: In sort(colnames(toc)) == sort(rownames(metaData)) : longer object length is not a multiple of shorter object length

It makes sense that the amount of barcodes for the raw data is greater that that of the filtered data

What would you recommend doing? I thought that soupX needs all the raw counts to clean the data. Therefore, it doesn't make sense to me to take only the filtered barcodes as cells. If the recommended course of action in this case would be to take only filtered barcodes then what is the purpose of the raw counts?

Thanks, Gil

constantAmateur commented 2 years ago

It is very hard to know what the problem is without example code and output. But it sounds like you haven’t constructed the SoupChannel object in the way the package expects. The tod parameter should be the full table of counts (all 701,967 columns), while the toc parameter should be a subset of tod, consisting of just those columns that contain cells. In your case that should be the 8,242 barcodes representing cells.

The other thing that can frequently go wrong is that clusters are matched based on the cell IDs, which are the column names of toc/tod. So perhaps check these are consistent.