etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

Difference of log2 values between .cnr file and .bintest #621

Closed Fponelle closed 2 years ago

Fponelle commented 3 years ago

Hi,

Why log2 values are different between cnr file and bintest.cns file?

Here is my example:

- In .cnr file, all bins on BRCA2 have a log2 between 1 and 2.5 suggesting and amplification of the entire gene:

chromosome start end gene log2 depth weight
chr13 32890559 32890670 BRCA2_ex02 1.92172 11112.5 0.989018
chr13 32893193 32893468 BRCA2_ex03 2.06781 9570 0.989823
chr13 32899192 32899327 BRCA2_ex04 1.29903 4212.33 0.989634
chr13 32900217 32900293 BRCA2_ex05 1.04573 1308.66 0.98301
           
chr13 32968805 32969076 BRCA2_ex25 1.86698 12887.1 0.993064
chr13 32971014 32971187 BRCA2_ex26 1.72835 12128.8 0.991547
chr13 32972278 32972595 BRCA2_ex27 1.55005 14927.4 0.993474
chr13 32972595 32972913 BRCA2_ex27 1.88532 14994.7 0.993332

- .cns file agrees with .cnr file showing a log2 for the whole gene of 1.64:

chromosome start end gene log2 depth probes weight ci_lo ci_hi
chr13 30937699 32972913 BRCA2_ex02,BRCA2_ex03,BRCA2_ex04,BRCA2_ex05,BRCA2_ex06,BRCA2_ex07,BRCA2_ex08,BRCA2_ex09,BRCA2_ex10,BRCA2_ex11,BRCA2_ex13,BRCA2_ex14,BRCA2_ex15,BRCA2_ex16,BRCA2_ex17,BRCA2_ex18,BRCA2_ex19,BRCA2_ex20,BRCA2_ex21,BRCA2_ex22,BRCA2_ex23,BRCA2_ex24,BRCA2_ex25,BRCA2_ex26,BRCA2_ex27 1.63355 7483.63 60 58.3449 1.5957 1.66772

- call.cns reports a CNV on the BRCA2 gene by counting 7 copies with a mean log2 of 1.6:

chromosome start end gene log2 cn depth p_ttest probes weight
chr13 30937699 32972913 BRCA2_ex02_UTR,BRCA2_ex02,BRCA2_ex03,BRCA2_ex04,BRCA2_ex05,BRCA2_ex06,BRCA2_ex07,BRCA2_ex08,BRCA2_ex09,BRCA2_ex10,BRCA2_ex11,BRCA2_ex13,BRCA2_ex14,BRCA2_ex15,BRCA2_ex16,BRCA2_ex17,BRCA2_ex18,BRCA2_ex19,BRCA2_ex20,BRCA2_ex21,BRCA2_ex22,BRCA2_ex23,BRCA2_ex24,BRCA2_ex25,BRCA2_ex26,BRCA2_ex27 1.60607 7 7483.63 1.29963e-39 60 58.3449

- But in bintest.cns file, for the same bins, log2 are really different, some of them are negative: (Initialy those bins were not reported in bintest file, I modified alpha value to 1 in bintest.py to understand why my bins were not reported as significative.)

chr13 32890559 32890670 BRCA2_ex02 0.288171 11112.5 0.989018 0.0161461
chr13 32893193 32893468 BRCA2_ex03 0.434261 9570 0.989823 8.8159e-05
chr13 32899192 32899327 BRCA2_ex04 -0.334525 4212.33 0.989634 0.00354805
chr13 32900217 32900293 BRCA2_ex05 -0.587826 1308.66 0.98301 3.62753e-05
             
chr13 32968805 32969076 BRCA2_ex25 0.233426 12887.1 0.993064 0.014072
chr13 32971014 32971187 BRCA2_ex26 0.0948008 12128.8 0.991547 0.411222
chr13 32972278 32972595 BRCA2_ex27 -0.0835012 14927.4 0.993474 0.411222
chr13 32972595 32972913 BRCA2_ex27 0.251767 14994.7 0.993332 0.00656489

Can you help me understand why there is such a difference?

Thanks

Flora

etal commented 3 years ago

Is this happening with CNVkit version 0.9.8?

Fponelle commented 3 years ago

I'm sorry I forgot to specify the version but yes I am using 0.9.8

Fponelle commented 3 years ago

Hi @etal , I would like to contact you again to find out if you have found some answers to my problem?

tskir commented 3 years ago

Hi @Fponelle, looking at the code, it looks like the log2 values in the bintest output are residuals: https://github.com/etal/cnvkit/blob/25d91cdb962639c98d041c3b05c7de659c41dc70/cnvlib/bintest.py#L26

If the segments are provided, the residuals are calculated compared to the segment to which the bin belongs, and otherwise to the whole chromosome: https://github.com/etal/cnvkit/blob/25d91cdb962639c98d041c3b05c7de659c41dc70/cnvlib/cnary.py#L417-L429

I suspect that you're running bintest with the segments specified. And so, given that the whole BRCA2 is duplicated, the change of CN of any given bin is not relevant compared to the BRCA2 segment.

Could you try re-running without the -s parameter to see if this fixes your issue?

In any case, it looks like documentation on bintest could be improved to reflect this behaviour.

Fponelle commented 3 years ago

Hi @tskir Thank you for your answer, my problem is solved. By default in the "batch" command the segments are specified, you must launch the "bintest" command separately to be able to remove this parameter.

etal commented 3 years ago

Yes, that's right. With -s, you're telling bintest that those are the alterations you already know about, and asking for any additional bin-level alterations. Maybe that subtraction should be used only for the reported p-values, and not alter the log2 values in the output.

tskir commented 3 years ago

Note to self — Leaving this issue open to improve bintest documentation in the future

tskir commented 3 years ago

Help wanted: for someone to document bintest and add the section to the relevant part of CNVkit docs