abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
205 stars 65 forks source link

CNVnator result filtering #30

Closed Gerde closed 3 years ago

Gerde commented 8 years ago

Dear Alexej Abyzov, I am using CNVnator v0.3.2 for CNV calling, after finish that, I got a raw call result, how can I filter it ? And I have an doubt about definition about duplication and deletion, is that normalized_RD > 1 is defined as duplication, and vice versa? if so, I got results like this : deletion chr1:122613801-122624200 10400 2.22569 15649.7 2.86171e+09 12154.1 2.8635e+09 0.627206 deletion chr1:122763901-122777200 13300 1.46077 18102.9 2.85913e+09 10231.4 2.86091e+09 0.520318 deletion chr1:122872301-122876900 4600 1.55713 117590 2.86689e+09 81973.3 2.86868e+09 0.399678 deletion chr1:123010201-123013300 3100 1.47059 231779 2.86823e+09 154720 2.86898e+09 0.371232 deletion chr1:123089801-123091700 1900 1.28593 528143 2.8693e+09 1 1 0.582763 deletion chr1:123274601-123278000 3400 1.0499 393230 2.86796e+09 358119 2.86975e+09 0.517699 deletion chr1:123341601-123352500 10900 1.34957 39799.6 2.86127e+09 25483.9 2.86305e+09 0.47585 deletion chr1:123946601-123951500 4900 3.25128 36370 2.86662e+09 37502.8 2.86841e+09 0.422769 deletion chr1:124144801-124148100 3300 1.28625 230407 2.86805e+09 58904 2.86984e+09 0.656344 deletion chr1:124716101-124719700 3600 2.19913 58635 2.86778e+09 49175.7 2.86957e+09 0.393118 Is it normal ? Looking forward to you reply! Cheers, Gerde

abyzov commented 8 years ago

Hi, From my experience most of false positives come from repetitive regions. So, the most confident calls are with q0 < 0.5 (see README on how to generate it). Larger events are more reliable. Also, regions with less repeats and segments duplication are more reliable (but this is taken care of by q0 filtering).

No, your results don’t look normal. May be bin size is too small?

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research,

Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, Harwick 3-12 200 1st street SW, Rochester, MN 55905 tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

standielpls commented 7 years ago

Can you elaborate on what makes this output looks abnormal? Other than the q0 column, what are some dead giveaways that the results are incorrect? Is it that there are so many CNVs one after the other along chr1?

abyzov commented 7 years ago

Hi, previously I got an impression that you have too many call in a small genomic regions. On a second look the regions is large (few Mb) and output could be OK. Please make sure that ratio of average to its sd is around 4-5.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research,

Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, Harwick 3-12 200 1st street SW, Rochester, MN 55905 tel: +1-(507)-538-0978 fax: +1-(507)-284-0745