COMBINE-lab / alevin-fry

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
https://alevin-fry.readthedocs.io
BSD 3-Clause "New" or "Revised" License
169 stars 15 forks source link

Merging replicates with different permit lists #127

Open ardecasien opened 1 year ago

ardecasien commented 1 year ago

Hello! I’m using alevin-fry to map and quantify single nuclei RNAseq data. I’m trying to merge technical replicates but need to use separate permit lists (each generated from a provided list of valid barcodes) for each replicate. Is there a way to do this (by e.g. merging collated RAD files before quantification)?

rob-p commented 1 year ago

Hi @ardecasien,

Thanks for the question. Let me see if I understand your usecase. You have separate technical replicates. Each replicate has a list of valid barcodes, and you'd like to merge the information from the different replicates into an aggregate quantification.

I guess the best way to do this depends on how the data was prepared. For example, were the technical replicates simply sequenced separately, or were the samples prepared separately as well? If they were prepared separately, then there's no obvious relationship between the barcodes in one sample and those in another, and so it's not clear how they could be aggregated. On the other hand, if the samples were prepared together, and then simply sequenced separately, then the barcodes have the same "meaning" across replicates. In that case, the easiest thing to do would be either to quantify them together in the first place (i.e. merge all of the reads from the different replicates), or to quantify them end to end and then just add the corresponding count matrices. These present slightly different mental models of how one is using the technical replicates, but we're happy to talk through the details of these different choices.

Also, I'm pinging @DongzeHE and @k3yavi here in case they have input or thoughts on this as well.

--Rob

ardecasien commented 1 year ago

Thank you for your swift reply!

We have some samples for which there are two technical replicates that were prepared separately. Since quantification occurs on a cell by cell basis, does this mean that we can process each replicate separately (including the quantification step) and then merge the resulting gene x cell count matrices (i.e., combine all cells attributed to the sample across replicates)?