Cell Ranger multi support

load10X doesn't work for cell ranger multi output since it has a different directory structure. Filtered and raw counts are located in two different locations. I therefore used -

table.droplets.gex.hto.channel.1 = Read10X(data.dir = "/path/to/count/sample_feature_bc_matrix")
dim(table.droplets.gex.hto.channel.1$`Gene Expression`)

[1] 32285 8424

table.counts.gex.hto.channel.1 = Read10X(data.dir = "/path/to/count/raw_feature_bc_matrix")
dim(table.counts.gex.hto.channel.1$`Gene Expression`)

[1] 32285 701967

The matrices do not have the same dimensions (as expected)

soup.channel = SoupChannel(tod = table.droplets.gex.hto.channel.1$`Gene Expression`, toc = table.counts.gex.hto.channel.1$`Gene Expression`)
dim(soup.channel$toc)

[1] 32285 701967

I then created a meta.data data frame head(metadata.cell.ranger)

                     RD1       RD2 cluster

AAACCTGAGACGCAAC-1 -6.205565 -1.484691 1 AAACCTGAGAGCTTCT-1 5.701532 -4.535702 7 AAACCTGAGCCTTGAT-1 -6.160045 -2.895694 1 AAACCTGAGTCGTTTG-1 -3.067753 -1.458081 1 AAACCTGAGTGTACGG-1 5.203574 6.485120 6 AAACCTGCAAAGCGGT-1 4.044699 5.431659 1

soup.channel = setClusters(soup.channel, setNames(metadata.cell.ranger$cluster, rownames(metadata.cell.ranger)))

Error in setClusters(soup.channel, setNames(metadata.cell.ranger$cluster, : Invalid cluster specification. See help.

I also tried soup.channel = SoupChannel(tod = table.droplets.gex.hto.channel.1$Gene Expression, toc = table.counts.gex.hto.channel.1$Gene Expression, metaData = metadata.cell.ranger) Error in SoupChannel(tod = table.droplets.gex.hto.channel.1$Gene Expression, : Rownames of metaData must match column names of table of counts. In addition: Warning message: In sort(colnames(toc)) == sort(rownames(metaData)) : longer object length is not a multiple of shorter object length

It makes sense that the amount of barcodes for the raw data is greater that that of the filtered data

What would you recommend doing? I thought that soupX needs all the raw counts to clean the data. Therefore, it doesn't make sense to me to take only the filtered barcodes as cells. If the recommended course of action in this case would be to take only filtered barcodes then what is the purpose of the raw counts?

Thanks, Gil

constantAmateur / SoupX

Cell Ranger multi support #110