XueyiDong / LongReadBenchmark

Benchmarking long-read RNA-seq analysis tools
MIT License
23 stars 2 forks source link

Question about count preprocessing for DTE #2

Open sparthib opened 3 weeks ago

sparthib commented 3 weeks ago

Hi @XueyiDong ,

I have used bambu and Isoquant for quantification of my ONT data. For DTE analysis using edgeR, did you calculate overdispersion to process your counts matrix before filtering by expression here?

Could you give me more clarification on what steps in edgeR you followed for DTE analysis?

Thank you, Sowmya

XueyiDong commented 2 weeks ago

Hi Sowmya,

Thank you for your questions!

Nowadays the recommended DTE analysis workflow of edgeR calculates overdispersion and divides the counts by them during preprocessing. However, the work in this repository was done before this method was published. I didn't adjust the counts using overdispersion. I just calculated the overdispersion on the main dataset to visualize the transcriptome mapping ambiguity.

You can refer to the scripts for the steps I used in edgeR for DTE analysis in my paper, but I would suggest you use the updated edgeR workflow if you are analyzing new ONT data.

Reference: https://doi.org/10.1093/nar/gkad1167

Cheers, Xueyi

sparthib commented 2 weeks ago

Thank you for your response, @XueyiDong!

I see that the overdispersion is specifically calculated for outputs from salmon and kallisto. However, in methods that don't offer bootstrap resampling, would the best way just be to use the raw counts instead of the counts scaled by overdispersion?

Sowmya

XueyiDong commented 2 weeks ago

You are right, the overdispersion is calculated using bootstrap samples generated by salmon or kallisto. If you quantify the counts using bambu and Isoquant, you can use the raw counts for DTE analysis.