CNVanator mapping quality zero

chirrie commented 4 years ago

I used CNVnator to identify CNV in whole genome sequence data, and I am a bit stuck on the actual reason for applying q0 >50% filter. I have read and re-read the paper but I am not getting it well.

My understanding of q0 is when you have read say of 100 bp, ideally it should map to one location but because of repetitive regions within genomes or due to alignment tools used, the segment of 100 bp may align well at more than one location and therefore one is randomly selected; that this the idea of q0. But the all notion of CNV is that their will be two or more segments above 1000 bp for instance which have identical sequences, meaning reads will map q0 for CNVs. So why do we have to use the 50% cutoff... I am not getting how the 50%(q<0.5) came about.

abyzov commented 4 years ago

Hello, please see figure S2 in the supplement of

CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing

https://genome.cshlp.org/content/21/6/974.long

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

chirrie commented 4 years ago

Dear Prof. Abyzov,

I have read the article with its supplementary documents containing the figure S2 on q0, but I still have a problem understanding why we have to apply the 50% cut off on called CNVs.

Let me rephrase my question; I understand the all essence of applying q0 is to deal with mappability biases arising from the repetitive nature of the reference genome. This means if a read in test sample maps two or more locations in the reference genome, then one is randomly selected and is normally assigned to zero mapping quality(q0). We know that quality zero reads (q0) are commonly found in CNV regions. From figure S2, you have mentioned that zero reads in CNV regions have a distribution between 0-100%, and that fraction above 50% are redundant...this is where I am getting mixed up... Does it mean if in the given genome there are 200 reads with zero mapping quality in CNVR, then in CNVR1, for example, there are 110 reads with mapping zero quality then we filter out this CNV region since qo>50%? implying these reads are repeats? Then if CNVR2 has 50 reads with zero mapping quality then we retain it since q0<50%? Is that the right explanation for it? If that is the case then I think this takes care of repeats in the reference genome right? So does it mean then that q0 filter for deletions takes care of repeats in the test sample?

abyzov commented 4 years ago

Hi, not sure what you means by CNVR1 and CNVR2, but if a regions has high fraction of q0 reads then read depth estimation in such a region is not reliable. Effectively yes, selecting calls with fraction of q0 < 50% filters out repeats.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

chirrie commented 4 years ago

Hii,

I just used CNVR1 and CNVR2 as example to mean CNV regions in a given species say CNV in chr1:2000-3000 or chr2:5000-6000.

So I am wondering with that filter or q0>50%, is their not a possibility of filtering CNV regions especially for duplication? Then what is the trade off if I decide to use say q0 >20%.

abyzov commented 4 years ago

Hello, with any filter there is a possibility to mis real variants. The purpose of q0 filter is not to consider regions with unreliable read mapping, and where CNV calling is prone to false positives. You can generate a plot similar to the one in fig. S2 and chose cut-off as seems reasonable to you.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Assistant Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

On Dec 17, 2019, at 12:00 AM, Vivien Chebii notifications@github.com<mailto:notifications@github.com> wrote:

Hii,

I just used CNVR1 and CNVR2 as example to mean CNV regions in a given species say CNV in chr1:2000-3000 or chr2:5000-6000.

So I am wondering with that filter or q0>50%, is their not a possibility of filtering CNV regions especially for duplication? Then what is the trade off if I decide to use say q0 >20%.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/abyzovlab/CNVnator/issues/176?email_source=notifications&email_token=ACLKGOPQNOI6PVWPWRSSDQTQZBTJTA5CNFSM4J2J42XKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHBH7YQ#issuecomment-566394850, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLKGOPFAJ2OO75R4TFGAILQZBTJTANCNFSM4J2J42XA.

C4t3 commented 4 years ago

Hello, I just wanted to make a comment on this and ask for your opinion. When CNVnator calls for a homozygous deletion, it's possible that few reads map in the deleted region and that they have low MAPQ. Applying the hard filter Q0<0.5 you would miss these variants.

thanks for your works and if you can reply. Cheers Caterina

abyzov commented 4 years ago

Hi, I didn’t think about this possibility, but I think yes, it is possible.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978 fax: +1-(507)-284-0745

On Jul 17, 2020, at 5:57 AM, C4t3 notifications@github.com<mailto:notifications@github.com> wrote:

Hello, I just wanted to make a comment on this and ask for your opinion. When CNVnator calls for a homozygous deletion, it's possible that few reads map in the deleted region and that they have low MAPQ. Applying the hard filter Q0<0.5 you would miss these variants.

thanks for your works and if you can reply. Cheers Caterina

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/abyzovlab/CNVnator/issues/176#issuecomment-660040592, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACLKGOK562KODG5PIJHWYWLR4AVA7ANCNFSM4J2J42XA.

abyzovlab / CNVnator