Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
355 stars 102 forks source link

Incorrect handling of non-N nucleotides with quality zero after Q-score binning #230

Open ivan-mh opened 1 year ago

ivan-mh commented 1 year ago

Dear developers,

Thanks a lot for this amazing tool!

I would like to ask you about Strelka support of Q-score binned FASTQs. Q-score binning is currently the default option for new sequencing machines, e.g. NextSeq1000, NextSeq2000 or NovaSeq6000. After Q-score binning is applied, bases with score 0-2 are assigned score 0. As a result, some bases with non-N nucleotides (i.e. ACGT) will have score zero. For a validation sample, I applied the Q-score binning, only for bases with score 0-2. After I run the sample through Strelka, I saw hundreds of somatic variant calls supported by one read. I also observed that ~1% of somatic expected variant calls are no longer called. I also saw ~0.15% germline expected variant calls no longer called. When I then masked/replaced those non-N nucleotides with quality zero by Ns, identical somatic and germline variants were called as before the Q-score binning, just the read support of the variants was typically by 1 read less, which is expected.

Do I understand it correctly that after the Q-score binning, non-N nucleotides with quality zero need to be replaced/masked by Ns to get correct variant calls from Strelka somatic and Strelka germline?

Many Thanks, Ivan