reasonable prefiltering of CNV calls

Thanks for the interest in my work. The supplement of the Science paper details filtering of SVs we applied. These filters (at least the segmental duplication filter) are also typically used for microarray analyses of CNVs.

As a preliminary filtering step, SVs were removed from the consensus callset if they overlapped by more than 66% with centromeres, segmental duplications, regions with low mappability with 100bp reads, regions subject to somatic V(D)J recombination (parts of anitbodies and Tcell receptor genes). SVs called by Manta or Lumpy were filtered if they had one or both breakpoints overlap ping one of these regions. Regions used for filtering can be found in our previous publication [Brandler,Antaki,Gujral AJHG 2016]

You can find BED files of these features on UCSC Table Browser in the reference build of your choice. The low mappability regions used in the publication was derived from DAC Blacklist, but I would now recommend using the UMAP track (the k24 would be more stringent of a filter, while k100 would be more lenient).

You can also remove SVs that are extremely large; LUMPY and Manta tend to call SVs that are near the size of the chromosome, I think due to repetitive telomeric sequence. Many of these SV calls would be lethal (monosomies/trisomies of chr1 for example). So you can remove those, depending on the context of your study. In ASD, we don't really expect germline SVs to be greater than 25-30Mb, so we typically remove SVs larger than that. Larger SVs require more time for SV² to process, so keep that in mind if time is of value to you. I hope this helped and if you would like some more information on how to process SVs in WGS, I would check out our publications, Sudmant 2015 Nature, and the most recent 1000 Genomes SV analysis on biorxiv (Chaisson 2017)

dantaki / SV2

reasonable prefiltering of CNV calls #24