abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
179 stars 26 forks source link

Filtering and genotyping questions #169

Closed zainabae closed 7 months ago

zainabae commented 1 year ago

Hello I have couple of questions,

  1. In view mode when I set the CNV size range for example 10,000 inf I also got CNVs with smaller sizes! (CNVs with 4,500, 9,000, 8,500 in size) Is this normal?

  2. Also, If I got region with 16, 17 copy number in all my samples, is it considered as false positive? And can I remove it manually? Or I have to change the parameters until its removed?

  3. The genotyping step is performed by default in view mode when we do filtering CNVs and merging samples, isn’t it?

  4. If i have 3 sets of samples, each with different mean depth of coverage, so I will use different bin sizes.. can I merge them later to compare between them?

suvakov commented 1 year ago
  1. No, it is not normal. Which caller are you using? We will check for potential reasons.

  2. It is usually related to mapping and happens close to the centromere or in some highly repetitive regions. If the distance from the gap in RD (dG_range parameter) does not work, I suggest manual filtering by RD level (this filter is not available in the current version, but we are considering adding it).

  3. Yes, in the merged output, the numbers are the result of the genotype function.

  4. No, the current merging functionality assumes the same bin size but this is good suggestion for future updates.

Thank you for questions and feedback.

zainabae commented 1 year ago

Thank you so much for the informative response..

Actually in the beginning I didn't use vcf files. I only applied the steps with bam files. I have implemented vcf to my pytor files later (variant calling was done with haplotypecaller of GATK-4.2.6.1)

But still I'm facing the same issue.. there are some CNVRs with size smaller than the selected size range in filtering step.

  1. No, it is not normal. Which caller are you using? We will check for potential reasons.

Can you recommend me some tools to do the manual filtering by RD level?

  1. It is usually related to mapping and happens close to the centromere or in some highly repetitive regions. If the distance from the gap in RD (dG_range parameter) does not work, I suggest manual filtering by RD level (this filter is not available in the current version, but we are considering adding it).