PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
255 stars 71 forks source link

GRIDSS VCF FILTER #139

Closed bailiang89 closed 6 years ago

bailiang89 commented 6 years ago

Hi,I used gridss call single sample SV,now I got a vcf result.Could you tell me some filter criterion to filter SV.Thank you.

d-cameron commented 6 years ago

Could you tell me some filter criterion to filter SV

This depends on how conservative you want to be and what your coverage is. The default call set passing all filters (FILTER="." or FILTER="PASS") is reasonable, but you might want to up the QUAL threshold to 1000 if you want a more conservative call set.

bailiang89 commented 6 years ago

Thank you for answering my qustion. I wanna know is there any else criterion to filter SV , because when I read the paper-GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly .I found a more strict criterion in supplemental material saying : • Variants not in the GRIDSS high confidence call set were filtered. • dGV (Database of Genomic Variants) hits within 10bp were filtered. • Alternate contig variants were filtered. • Variants less than 1Mb were filtered. • Translocation under 5kbp were filtered. • Variants without split read, read pair, and assembly support were filtered. • Telomeric and centromeric variants were filtered. • Variants with a NCBI blast hit (all human sequences) spanning the breakpoint were filtered

But I don't real understand the criterion,could you give me some details abort this. Thank you.

d-cameron commented 6 years ago

Those criteria were for calling large somatic mutations without a matched normal. There were only applied to the data set where we have WGS sequence of a tumour without a match normal.

If you just have a germ-line sample, those criteria will likely filter out (almost) all of your real events as there are very few events larger than a Mb in a germline sample. For germline, you'd do well to just filter out anything over 1Mb and all interchromosomal calls as they're likely to be false positives when you're looking at germline samples.

bailiang89 commented 6 years ago

First,my project is target panel sequencing. Calling SV both in single sample and Multi-Sample .Sample type is tumor or normal.Is those criteria fit my project. Second,as you say the criteria is differect from single sample and multi sample,isn't it? Third,about my project ,could you give some more details about filter SV expect considering QUAL. Thank you.

d-cameron commented 6 years ago

the criteria is differect from single sample and multi sample,isn't it?

Just multi-sample calling is fairly straight-forward. Somatic calling requires you to remove events in just the normal (and keep events in only the normal when considering somatic LoH). With an uncontaminated normal, this the fairly straight-forward as you can just use the per-sample QUAL scores in the genotype fields.

Third,about my project ,could you give some more details about filter SV expect considering QUAL.

The current GRIDSS model considers each reads as completely independent. This means that QUAL scores have a linear relationship with coverage so higher coverage will have higher QUAL scores.