StevenWingett / FastQ-Screen

Detecting contamination in NGS data and multi-species analysis
https://stevenwingett.github.io/FastQ-Screen/
GNU General Public License v3.0
62 stars 15 forks source link

Discrepancy in ambiguous alignments between default and bisulfite mode #6

Closed FelixKrueger closed 4 years ago

FelixKrueger commented 5 years ago

Currently, there seems to be a discrepancy in the counting of ambiguously mappable sequences between the default mode, and the --bisulfite mode. Here is an example of a human RRBS sample which was aligned with FastQ Screen in default mode:

non-bisulfite

It doesn't really produce uniquely aligned reads, which is fine as this is a bisulfite library. Of note, the sample contains ~35% of microsatellite sequences, a multimer of (TGGAA)n (see also here https://github.com/FelixKrueger/Bismark/issues/265). This satellite repeat contamination, which is present in all animal species tested, is responsible for a generally low unique mapping efficiency.

When I ran FastQ Screen in --bisulfite mode, it does identify the sample as mainly human, but interestingly it does not show the ambiguously aligned micro-satellite sequences in all species:

fq_screen_plot

I suspect that the counting of ambiguous alignments in --bisulfite mode might be missing this contaminant. Maybe this has to do with the formatting of the read ID that is written out into the ambiguous.fastq file?

StevenWingett commented 4 years ago

Added --score_min L,0,-0.6 as a Bismark/Bowtie2 mapping parameter to make FastQ Screen perform less stringent mapping, which is better for a QC tool and more consistent with non-bisulfite FastQ Screen mapping

Git commit: 517bee150acb14bcf9d9822f658afd530c8698a4