I would like to ask you about Strelka support of Q-score binned FASTQs. Q-score binning is currently the default option for new sequencing machines, e.g. NextSeq1000, NextSeq2000 or NovaSeq6000. After Q-score binning is applied, bases with score 0-2 are assigned score 0. As a result, some bases with non-N nucleotides (i.e. ACGT) will have score zero.
For a validation sample, I applied the Q-score binning, only for bases with score 0-2. After I run the sample through Strelka, I saw hundreds of somatic variant calls supported by one read. I also observed that ~1% of somatic expected variant calls are no longer called. I also saw ~0.15% germline expected variant calls no longer called.
When I then masked/replaced those non-N nucleotides with quality zero by Ns, identical somatic and germline variants were called as before the Q-score binning, just the read support of the variants was typically by 1 read less, which is expected.
Do I understand it correctly that after the Q-score binning, non-N nucleotides with quality zero need to be replaced/masked by Ns to get correct variant calls from Strelka somatic and Strelka germline?
Dear developers,
Thanks a lot for this amazing tool!
I would like to ask you about Strelka support of Q-score binned FASTQs. Q-score binning is currently the default option for new sequencing machines, e.g. NextSeq1000, NextSeq2000 or NovaSeq6000. After Q-score binning is applied, bases with score 0-2 are assigned score 0. As a result, some bases with non-N nucleotides (i.e. ACGT) will have score zero. For a validation sample, I applied the Q-score binning, only for bases with score 0-2. After I run the sample through Strelka, I saw hundreds of somatic variant calls supported by one read. I also observed that ~1% of somatic expected variant calls are no longer called. I also saw ~0.15% germline expected variant calls no longer called. When I then masked/replaced those non-N nucleotides with quality zero by Ns, identical somatic and germline variants were called as before the Q-score binning, just the read support of the variants was typically by 1 read less, which is expected.
Do I understand it correctly that after the Q-score binning, non-N nucleotides with quality zero need to be replaced/masked by Ns to get correct variant calls from Strelka somatic and Strelka germline?
Many Thanks, Ivan