PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
250 stars 73 forks source link

Calibration of Scoring Model for NovaSeq Data #416

Open DarioS opened 3 years ago

DarioS commented 3 years ago

Some G repeats have a high quality score and are annotated by RepeatMasker with a category. But, it's quite similar to no-calls.

chr1    1902198  gridss0f_7215b  G       GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGG.    1554.88 PASS ... INSRMRC=Simple_repeat;INSRMRO=+;INSRMRT=(G)n
chr1    21818865 gridss2f_4400b  G       GGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG.    1593.27 PASS ... INSRMRC=Simple_repeat;INSRMRO=+;INSRMRT=(G)n
chr1    39904841        gridss3f_27546b T       TGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTGTGGGGGGGGGGGGGGTTGGGGGGGTGTGGGGTCTGGGGGGG.  2050    PASS ... INSRMRC=Simple_repeat;INSRMRO=+;INSRMRT=(G)n

Does the quality score model account for NovaSeq's two-colour chemistry? Do you have technical replicate samples sequenced using HiSeq and NovaSeq to benchmark false positive rate on NovaSeq?

d-cameron commented 3 years ago

I agree - they do look very suspicous

Does the quality score model account for NovaSeq's two-colour chemistry

It does not

Do you have technical replicate samples sequenced using HiSeq and NovaSeq to benchmark false positive rate on NovaSeq?

I'll follow up to see if Hartwig as NovaSeq COLO829 sequencing.