COMBINE-lab / simpleaf

A rust framework to make using alevin-fry even simpler
BSD 3-Clause "New" or "Revised" License
46 stars 5 forks source link

[feature request] 10x chemistry autodetection #158

Open rob-p opened 3 months ago

rob-p commented 3 months ago

A recurring feature request — provide automatic chemistry detection, at least in the case where we know that the input data is 10x. This would look something like passing -c auto10x and simpleaf would determine the chemistry present in the input. It’s OK, probably, to ignore 10x v1 (which anyway requires 3 input files), but most other single-cell RNA-seq chemistries should be detectable.

The basic idea would be to look at the combination of UMI and Barcode length and also the overlap of observed barcodes from a prefix of the reads and the different available permit lists.

AndrewSkelton commented 1 month ago

CellRanger's implementation of chemistry auto-detect is public and available here (already in rust) - https://github.com/10XGenomics/cellranger/blob/a03981609639e55d3bef57811194c7197e8590b2/lib/rust/cr_lib/src/stages/detect_chemistry.rs#L337

While you're probably already aware of this, I'll share for posterity if nothing else

rob-p commented 1 month ago

Thanks @AndrewSkelton, though given their license, we have to be careful here!

microbemarsh commented 1 month ago

I just wanted to add to this, I would really appreciate if you could include the 10X ARC multiome chemistry in this auto barcode detection. The cellranger-atac workflow allows for an option to run the ARC chemistry but I'd like to use simpleaf for the scRNA quant side of things.

Thank you!