Feature request : Demultiplexing to process 10x Fixed RNA Profiling kit

rob-p commented 2 years ago

The 10x Fixed RNA Profiling kit has some benefits with respect to the standard kit. For example, cells can be stored before library prep, there is a gene targeting mechanism based on specific targeted probes rather than standard polyA capture. Finally, and relevant to this feature request, samples can be efficiently multiplexed with the use of an additional barcode the pCS1 probe described here.

One could, of course, demultiplex the initial reads and process each pCS1 barcode separately with alevin -> alevin-fry, which is what the current recommendation would be. However, it would be much more efficient (and easier for the user) if this could be handled internally for samples prepared with this chemistry.

The purpose of this issue is to discuss how best to handle this feature. Currently, there are 3 different ideas (but I'm open to more).

Have salmon "handle" this, such that when this chemistry is processed, many different "quant" directories are created, one for each pCS1 barcode. The benefit here is that each barcode can then be handled separately without any modifications required to alevin-fry. The down sides are that we will crate many potential output directories, we don't know how many there will be or what their names are a priori, and this will add non-trivial complexity on the salmon side.
Have salmon "handle" this by outputting many distinct RAD files in the output quant directory — one for each pCS1 barcode. This is cleaner than 1 in my opinion, but still not ideal. Specifically, we don't know a priori what barcodes exist in the sample before processing, so we don't know what RAD files will be created, how many there will be, or how they should be named. This gets tricky as RAD files will need to be created dynamically and that's non-trivial from the highly-mutithreaded context that exists during read mapping.
So far, my favorite proposed solution. When processing this type of chemistry, have salmon output an augmented RAD file. In addition to the UMI and barcode, this RAD file will include a field for the pCS1 barcode of each processed read. However, all records will be routed to the same initial RAD file. Subsequently, we will add a command to alevin-fry (e.g. alevin-fry demux that will be capable of demultiplexing this into many distinct "standard" RAD files, which can then be processed as normal). This has the complication that we need to add the ability to parse and demultiplex the augmented RAD format, and we need to determine how this should affect the downstream steps (e.g. it may no longer be true that for such a sample, after demultiplexing, the quant directory contains a single map.rad file). However, this solution currently seems the cleanest to me, and I feel like it is generally a good principle to push such processing off to alevin-fry rather than require it upstream in salmon where possible.

Anyway, thanks to @ATpoint for raising this request. I'm also pinging @DongzeHE and @k3yavi for their input here.

ATpoint commented 2 years ago

Thanks @rob-p for opening the issue. Just as a comment, I was wrong to assume that the additional barcode was in R1, and it's called Probe BC, not pCS1, sorry for that. After diving deeper into the protocol, pages 23 and 79 I realize that in fact the additional barcode is called "Probe BC", and it is 8bp long, sequenced as part of Read2, not Read1. Read1 as usual covers the 16bp CB and 12bp UMI, and Read2 covers the ligated gene expression probe (lets call it GExp), followed by a constant linker sequence and the 8bp Probe BC. So in any case I guess one needs to make sure that this constant linker does not end up being quantified together with the gene expression part of the read if Read2. I guess the trick would be to scan Read2 for the constant linker and then consider the part upstream of it for the GExp quantification, and the 8bp downstream for the Probe BC. That probably should then be an additional flag/module in alevin/alevin-fry, something like --chromiumFixedRNA to trigger all this internally, as it seems to be quite different than regular 10x 3' libraries. The GExp size is about 50bp, this constant linker is 16bp.

laurie-tonon commented 1 year ago

Hi,

I was wondering if this functionality has already been implemented in alevin-fry. I have 10x multiplexed FRP data that I am having trouble analyzing with cellranger, and I would like to try alevin-fry on it.

Thanks

rekren commented 2 months ago

Hi @rob-p and the team !

This Flex (10x Fixed RNA Profiling kit) chemistry becoming more and more popular among the user-base due the operational and cost/effective reasons.

During the scverse2024 conference at Munich, this feature request was also asked by different users.

Can we expect an implementation supporting these libraries soon ?

P.S. : It is really bummer to forced to use CellRanger multi pipeline for this task :(

COMBINE-lab / alevin-fry

Feature request : Demultiplexing to process 10x Fixed RNA Profiling kit #85