etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
502 stars 163 forks source link

CNVKit 0.9.9 Batch mode has inconsistent CN and log2Ratio #761

Open JD12138 opened 1 year ago

JD12138 commented 1 year ago

Hello, I have a pair of WGS sample (A normal and a tumor). And I use the command line to detect CNVs. cnvkit.py batch $tumor --normal $normal \ --fasta $reference \ --annotate $refFlat \ --output-dir $out/$sample \ -p 20 -m wgs \ --scatter --diagram

But the log2Ratio and the copy number in my sample.call.cns file is inconsistent. I list the top 10 lines below: log2 cn 0.224267 3 -0.207029 2 0.0982319 2 -0.222901 2 0.140716 2 -0.117483 2 0.0339522 2 -0.0921905 2 0.157111 2

Could you tell me the reason? Or what's the defination of log2Ratio and cn in this file ? Thank you!

tetedange13 commented 1 year ago

Hi @JD12138,

The steps leading to sample.call.cns file are not vey well documented :

  1. Raw log2ratios (the ones contained in your sample.cns file) are filtered to remove likely false-positive segments (based on confidence interval calculation)
  2. Segments are median-centered and the "p_ttest" column is calculated (p-value of a t-test)
  3. Eventually integer copy number values are called from log2ratio values (= "cn" column), using thresholds method with following thresholds : -1.1, -0.25, 0.2, 0.7

To sum up your observed CN and log2ratio values are not inconsistent => Everytime a log2ratio falls between -0.25 and 0.2, called CN is "neutral" = reference ploidy = 2 (by default) => Your first row with log2ratio=0.224267 is above 0.2 threshold so CN=3 is called

Hope this helped ! Have a nice day, Felix.

Marmach commented 5 months ago

Hi, I recently updated CNVkit (i used 0.9.5 for a looong time) and noticed that batch produces also call.cns files with p_ttestcolumn and without ci_hi / ci_lo columns. However, when I manually calculate call.cns files, the latter are preserved while p_ttest is not calculated. I wanted to ask whether pval can be calculated here using some bulit-in command of function or only manually from confidence intervals?