Closed Shashankti closed 8 months ago
First, the initial warning:
| Number highly reactive check:
| 0.0% (0/67) nucleotides show high apparent reactivity.
| FAIL
ShapeMapper2 is indicating that while most nucleotides have a positive mutation rate, that mutation rate is not exceptionally high. This could be for any of the reasons listed
| Possible causes:
| - DNA contamination
| - poor mixing of chemical reagents and RNA and/or poor
| reagent diffusion (if modifying in cells), resulting
| in low modification rates
| - expired reagents, resulting in low modification rates
| - poor reverse transcription conditions, resulting in
| low adduct read-through
| - extremely highly structured RNA
Second, regarding the alignment, I don't think I am seeing the same issue that you are. It is easy to be distracted by the alignment rates of the PAIRED reads, which only represent a few percent of your samples. I've removed that information here to highlight the actual total alignment rates for each sample.
Edit: "Paired reads" is misleading here. ShapeMapper2 first performs read merging, then alignment. During read merging, R1 and R2 are combined into a single read, and passed to bowtie as an "unpaired" fasta file, "paired reads" are passed to bowtie as unmerged R1 and R2 files. Having a high percentage of "unpaired" reads is a good thing.
|BowtieAligner (sample: Denatured) output message:
|-------------------------------------------------
| ...
| 97.95% overall alignment rate
|BowtieAligner (sample: Untreated) output message:
|-------------------------------------------------
| ...
| 98.23% overall alignment rate
|BowtieAligner (sample: Modified) output message:
|------------------------------------------------
| 98.51% overall alignment rate
Thank you so much for the clarification. I misunderstood the alignment stats. Can you confirm that having overrepresented sequences after the filtering and merging is standard behavior, because I was not able to see that in the example run files.
Thanks
In short, the more informative warning is the one you are getting from ShapeMapper2. The low mutation rates in your treated sample are the problem.
From FastQC documentation:
Because the duplication detection requires an exact sequence match over the whole length of the sequence any reads over 75bp in length are truncated to 50bp for the purposes of this analysis. Even so, longer reads are more likely to contain sequencing errors which will artificially increase the observed diversity and will tend to underrepresent highly duplicated sequences.
However, this also means that there may be chemical-adduct induced mutations which FastQC does not see, causing the program to overreport non-duplicated sequences. I would expect that ~90% of a single sequence is typical for an amplicon experiment with low mutation rates.
Thanks again for the help.
I have been having some issues with running the shapemapper2 pipeline on one of our samples however, the pipeline gives the error:
I decided to check the alignment stats to try to understand the error and this was the output:
I looked at the aligned sam files and I found this sequence to be significantly overrepresnted![image](https://github.com/Weeks-UNC/shapemapper2/assets/50117820/2db41d1f-73cb-4124-85a9-e6399265e137)
For reference, this is the target.fa file used for the run:
>mir-132_RNA taatgggagaccgcccccgcgtctCCAGGGCAACCGTGGCTTTCGATTGTTACTGTGGGAACTGGAGGTAACAGTCTACAGCCATGGTCGCcccgcagcacgcccacgcgcattg
It seems that the reverse complement sequence is not read in properly. Can you please let me know what could be the cause of this error, or if this behavior is normal?Thank you for the help