dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

Mutation counts from secondary alignments? #44

Closed physnano closed 10 months ago

physnano commented 11 months ago

Hi Danny,

I have a question regarding how rf-count operates with alignments/SAM flags. Based on output from rf-rctools view I am seeing fewer (T->C) mutations {rfcount_7sK.jpg} (no positions > 1 mutation) than I would expect based on what I see with IGV (>9 positions with >10 T->C mutations). {IGV_7sK.jpg} When I look at the SAM flags for these particular alignments there is a preponderance of secondary alignments (~704) compared to primary alignments (~56) {sam_file_flag_breakdown_7sk.jpg} suggesting that rf-count is not counting these mutations from secondary alignments. However the coverage output does appear to be correct. Based on the option in rf-count to only consider primary alignments ("-pn") I would not have expected this default behavior. Can you comment on rf-count handling of secondary alignments?

{IGV_7sK.jpg}: IGV_7sK

{rfcount_7sK.jpg}: rfcount_7sK

{sam_file_flag_breakdown_7sk.jpg}: sam_file_flag_breakdown_7sk

Thank you again and best regards,

Will

dincarnato commented 11 months ago

Hi Will,

As you can see from the number of options available in rf-count, there are many reasons a mutation might be ignored/discarded.

Can you please give me an example BAM file, corresponding Fasta, and command line you executed?

Danny

dincarnato commented 11 months ago

Also, please make sure you git pull the latest version as there was a bug in the "--only-mut" analysis that is now fixed.

dincarnato commented 10 months ago

Hi Will,

any update here?

Cheers, Danny

physnano commented 10 months ago

Hi Danny,

7SK seems to be a particularly problematic example of what I have attempted due to multiple transcripts/pseudogenes for this ncRNA. In this case, primary alignments were contributing to the coverage at secondary locations (but not the mutation counts, even in the case without the primary only option, or maybe I am interpreting the output incorrectly). Also it would be nice to have the coverage for each transcript reflected in not only the .rc file but also in the "raw_counts" output. That being said I think I have a handle on things now :) .

Overall thank you so much for your help and for developing this amazing package.

Best,

Will

dincarnato commented 10 months ago

Hi Will,

if you can provide me with an example BAM file, and explain me in detail the issue, I can maybe adjust things in the framework. You can send it to me via email.

Cheers, Danny