etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
547 stars 165 forks source link

Duplications detected in antitarget regions. #401

Closed emiAG closed 5 years ago

emiAG commented 5 years ago

Hello,

We used CNVkit 0.8.5 on a large panel (around 300 genes) and we got a duplication for a sample which has been called in a region that is off target. This region is : chr9:43921813-47376826. In the cnr file, we observe that there is absolutely no read between the position chr9:47376826 and the position chr9:65402984. We then see in the segment file (.cns), that the region without any read artificially create a drop between itsefl and the region called as a duplication.

Chr start end gene log2 depth weight
chr9 41968979 43921813 - 0.918313 0.319901 13 5.88461
chr9 43921813 47376826 - 0.333689 0.21323 20 9.7614
chr9 47376826 66754946 - 0.00679507 0.0064292 127 72.5873
chr9 66754946 70961049 - 0.764571 0.481092 25 10.5135

Why does cnvkit call cnv in off target regions? Is there a way to get rid of this kind of artifactual cnv?

Many thanks in advance for your help.

etal commented 5 years ago

For a capture panel I recommend using genemetrics as well to help summarize CNV status per gene. That will probably exclude the pericentromeric regions, which can be quirky.

In your example table, it looks like the "gene" column was lost, and the remaining columns shifted over somehow.

Does your panel use targeted amplicon sequencing or hybrid capture? If the latter, then there are off-target reads that CNVkit uses to supplement its CNV calls. If the former, then the off-target reads are absent and some other considerations come into play: https://cnvkit.readthedocs.io/en/stable/nonhybrid.html