How to realize umi-tools directional algorithm in alevin-fry

COMBINE-lab / alevin-fry

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.

https://alevin-fry.readthedocs.io

BSD 3-Clause "New" or "Revised" License

169 stars 15 forks source link

How to realize umi-tools directional algorithm in alevin-fry #137

Closed tengdiyouying608 closed 8 months ago

tengdiyouying608 commented 8 months ago

Hi Alevin-fry developers,

I used to use UMI-tools, but now I want to switch to SimpleAF to significantly reduce processing time. How can I configure the parameters in SimpleAF to implement the “directional” method used in UMI-tools?

rob-p commented 8 months ago

Hi @tengdiyouying608,

The original directional algorithm of UMI-tools isn't implemented in alevin-fry. Rather, we implement the improved (but conceptually-related) version of this algorithm presented in the original alevin paper, which is co-authored with the authors of UMI-tools. Specifically, rather than doing a greedy directional collapse, the new algorithm attempts to cover the graph using a minimal number of "arboresences" (directed trees), each of which can be explained by a single molecule. This method is obtainable in alevin-fry by asking for the parsimony resolution mode. If there is a specific reason you're interested in the directional algorithm itself though, or a use case where you think it is clearly most appropriate, @DongzeHE and I would be happy to discuss!

Best, Rob

tengdiyouying608 commented 8 months ago

Hi @rob-p ， Thank you for your prompt response. In my data, I have tested with the 'parsimony' parameter and compared it with the 'unique' (left) and 'directional' (right) parameters in UMI-tools. The x-axis represents gene expression obtained through UMI-tools, and the y-axis represents gene expression obtained through 'parsimony'. It seems that the results from 'parsimony' are closer to UMI-tools' 'unique' which is essentially the uncorrected output. So, that's why I'm attempting to realize 'directional' in alevin-fry. And when the values exceeds 1200,the differences between the two become more pronounced. If there are any details or information I might have overlooked, I would greatly appreciate it if you could let me know! 1709603945126

Best, Margarita

rob-p commented 8 months ago

Hi Margarita (@tengdiyouying608),

Very interesting! This is certainly something worth understanding better. Could you say a little bit about what type of data this is? Is this data that you can share for testing purposes (we can of course do this offline e.g. via e-mail if that would be better). @DongzeHE and I would be happy to take a look. I wouldn't generally expect so large a difference, but seeing it, I might also worry if there's over-collapse under the UMI-tools model (that doesn't afaik, have the same constraint that there must be a transcript capable of supporting all of the collapsed UMIs). Either way, we'd be happy to look more if the data, or a reproducible part of it, is sharable.

Best, Rob

wangjiawen2013 commented 8 months ago

Instead of over-collapse under the UMI-tools model, I am worring about the under-collapse of alevin-fry ! Because I found the result of umi-tools is more consistent with bulk RNAseq.

tengdiyouying608 commented 8 months ago

Hi @rob-p ,

I have sent the data to rob@cs.umd.edu . Thank you again for looking in to this.

Best, Margarita.