abyzovlab / CNVnator

a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
Other
206 stars 65 forks source link

Bin size and sequence read length? #270

Closed devin-qiu closed 2 years ago

devin-qiu commented 2 years ago

Hello Dr. Abyzov,

Thank you for developing CNVnator! I have some questions regarding this software:

  1. Does the input bin-size need to be in concordance with the sequencing read length? For example, our 30x sequencing samples has read length of ~150bp, should we consider setting the bin_size as 150?

  2. I already generated a CNV call set with our child's sequencing data in a trio. Since I also have data for parents, could we filter de novo CNV in proband rather than having a whole CNV calls, which might be from parents?

  3. The partitioning step is the most time consuming step, so we want to avoid running this step again if we keep the same input sequencing file and bin_size. I'm wondering if CNVnator generates an intermediate document of partitioned sequences? If so, where is it stored?

Looking forward to hearing from you, Devin

abyzov commented 2 years ago

Hi Devin, below are answers to your questions.

  1. Yes, the size of bins depends on coverage and read length. For you data I would consider bins between 200-500 bps.
  2. Yes, you can genotype CNV region in parents and then compare genotype to call de novo CNVs. See Abyzov et al., Nature 2012 PMID:23160490
  3. Yes, all the info (after each step) is store in .root file.

Alexej Abyzov, Ph.D. Senior Associate Consultant, Associate Professor of Biomedical Informatics, Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic

Mayo Clinic, 200 1st street SW, Harwick 3-12 Rochester, MN 55905 www.abyzovlab.orghttp://www.abyzovlab.org tel: +1-(507)-538-0978

On Aug 29, 2022, at 1:12 PM, andresqiu @.**@.>> wrote:

Hello Dr. Abyzov,

Thank you for developing CNVnator! I have some questions regarding this software:

  1. Does the input bin-size need to be in concordance with the sequencing read length? For example, our 30x sequencing samples has read length of ~150bp, should we consider setting the bin_size as 150?

  2. I already generated a CNV call set with our child's sequencing data in a trio. Since I also have data for parents, could we filter de novo CNV in proband rather than having a whole CNV calls, which might be from parents?

  3. The partitioning step is the most time consuming step, so we want to avoid running this step again if we keep the same input sequencing file and bin_size. I'm wondering if CNVnator generates an intermediate document of partitioned sequences? If so, where is it stored?

Looking forward to hearing from you, Devin

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fabyzovlab%2FCNVnator%2Fissues%2F270&data=05%7C01%7CAbyzov.Alexej%40mayo.edu%7C9cb1a9b9c0494f22bc1808da89ea8c1b%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C637973937735156490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p1IXavIBvYKSzbFBk9dr%2FrMdm%2BRGLGV1hYVjh%2FS4GLQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACLKGOK7V5TJ4GOFHYLWLELV3T4QZANCNFSM576XDGBQ&data=05%7C01%7CAbyzov.Alexej%40mayo.edu%7C9cb1a9b9c0494f22bc1808da89ea8c1b%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C637973937735156490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vS0ePHewCN7fUdiltg0T6nPAEnKqUNmmPIpD6%2BwXP%2BQ%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>