COMBINE-lab / alevin-fry

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
https://alevin-fry.readthedocs.io
BSD 3-Clause "New" or "Revised" License
156 stars 15 forks source link

doc on read geometry #94

Closed fransua closed 1 year ago

fransua commented 1 year ago

Hi, I cannot find clear help on read_geometry. There are a couple of examples but they do not seem to work for me and I am struggling to change them. Specifically I have several questions:

thanks

rob-p commented 1 year ago

Hi @fransua,

The geometry tag describes how the UMI, cell barcode, and "biological" read sequence are parsed during the mapping phase (upstream of alevin-fry). Therefore, the most appropriate place to address these queries is in the GitHub issues for the salmon repository. I will repost this there and tag you.

As to your last question — salmon can deal with inexact locations for barcodes / UMIs, but that is an "advanced" geometry setting that is not yet in the released version of the tool (basically, where one can describe the sequence motif that must occur upstream of the barcode and UMI). If you can describe your protocol there a bit more, we could implement a custom flag for it to parse the information for such reads. Alternatively, you can use another package (e.g. UMI-tools) to "normalize" the barcode and umi geometry into a simpler format that can be directly handled by salmon.

Best, Rob