broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 594 forks source link

Base-specific Phred score filtering/output options in AnalyzeSaturationMutagenesis #8995

Open BenjaminWehnert1008 opened 1 month ago

BenjaminWehnert1008 commented 1 month ago

Hey there!

I am currently working on a project involving mutagenesis analysis using the AnalyzeSaturationMutagenesis tool. Our rationale is that a single nucleotide exchange due to a sequencing error is more likely than two or three consecutive nucleotide changes in one mutated codon, making it important for us to adjust the quality standards based on the type of variant (single/double/triple exchanges).

I would like to ask if there is any existing functionality within the AnalyzeSaturationMutagenesis tool to incorporate base-specific nucleotide quality into the analysis, especially in a way that allows for different quality thresholds based on the number of consecutively mutated bases in one codon. If such a feature is not available, I would appreciate any suggestions you may have for a potential workaround that would enable us to set a stricter quality threshold for single nucleotide variations in the current tool.

Thank you in advance!

Best, Benni

@MaximilianStammnitz @tedsharpe

MaximilianStammnitz commented 1 month ago

Hi @tedsharpe,

A quick follow-up question on this – how does the function currently handle the Q-scores from overlapping portions of paired-end reads? @BenjaminWehnert1008 noticed that read pre-merging and dual Q-score integration can help improve our performance for 1nt variants.

Best wishes, Max