Closed A-N-Other closed 2 years ago
Hi George,
thanks for your message. So, in order to understand what might be going on I need 2 things:
This way I can reproduce the putative issue. Best,
Danny
Thanks, Danny. I've shared a Dropbox link with you (@rug.nl email) - hope that's OK. I dug around for the smallest BAM I could find that displayed the issue without having to subsample and re-do rf-count
. I've shared with you one of the DMSO control samples from an old Flu (PR8) SHAPE-MaP experiment. In the tarball you should have the BAM and index, the .rc
file and the two different sets of depths that you get by running:
~/bin/RNAFramework/rf-rctools view -t rf-count/DMSO-1.rc > depths_rf-rctools.txt
and
samtools depth -a rf-map/DMSO-1.bam > depths_samtools.txt
Directly from my SLURM submission script, the exact command run was:
~/bin/RNAFramework/rf-count \
--processors 1 \
--working-threads 8 \
--sorted \
--tmp-dir "$tmp" \
--output-dir rf-count/"$base" \
--overwrite \
--fasta reference/reference.fa \
--mask-file reference/rf-count.blacklist \
--count-mutations \
--min-quality 20 \
--eval-surrounding \
--collapse-consecutive \
--max-collapse-distance 6 \
--right-deletion \
"$BAM"
Many thanks for your help with this!
George
Can you please send also the reference.fa file?
Attached that for you here ... .fa.txt
so that GitHub will take it.
reference.fa.txt
Hi George,
so, the "issue" was caused by the fact that, by default, samtools depth excludes certain reads, for example reads marked as PCR/optical duplicates. rf-count did not make such a selection. I have now added this filtering by default. See the resulting coverage below.
It can be turned off by setting "-ndd" or "--no-discard-duplicates".
I hope this helps and solves the issue. Thanks so much for reporting this (unexpected) behaviour! Best,
Danny
Fantastic! Many thanks for looking into this so quickly. I'll pull down the latest version and re-run my pipeline.
Great. Also, please re-pull the latest update. I noticed that, by mistake after my changes, the total % of mutated reads in the output was wrong (twice as much what it was supposed to be). It's fixed now. This did not affect the RC files.
Thanks, Danny. Working exactly as I'd expect at this point.
Hi Danny
I'm in the process of putting together a script that scans a working directory for RNAFramework outputs and generates a variety of QC plots for what it finds - sequence coverage, average baseQ across the transcript from the BAMs, summaries of % read mapping, per-base reactivity plots, etc. However, when getting transcript coverage data from an
.rc
file usingrf-rctools view
I was getting unrealistically high depths and, if I compare the numbers to whatsamtools
reports, they're very different!RNAFramework (I'm running 2.7.2):
rf-rctools view rf-count/5NIA-1.rc | sed -n '4 s/,/\n/gp' > depths_rf-rctools.txt
samtools:samtools depth -a rf-map/5NIA-1.bam | cut -f3 > depths_samtools.txt
For two different samples from two totally separate experiments (the second was from May and run with a different version of RNAFramework), here are two comparison plots: DepthComparison.pdf
What's even more strange to me is that the profiles of those lines are also different! Out of interest I also ran
rf-wiggle --ratio
and it also uses therf-rctools
version of the depths in the calculation.Can you shed some light on this for me - am I misinterpreting what's being reported? Thanks for your help!