Closed leoSeattle closed 7 years ago
In any .cnr file the log2 ratio is the ratio of the normalized coverage depth at a bin versus the normalized reference coverage. Normalized means recentered (in log2 scale) so that the genome-wide average bin log2 value is 0. If the reference is a pool of normals, the reference log2 value is the pool's average; if the reference is flat/generic, then the output log2 values are relative to the genome-wide average log2 value.
I have a similar question as leoSeattle. When I try to calculate the log2 value in the cnr file using the method described above by etal, I don't quite get similar values. Here is an example:
reference.cnn
chromosome start end gene log2 depth gc spread chr1 12050 12277 LOC102725121,DDX11L1 -4.01333 3.46351 0.515419 0.541461
targetcoverage.cnn
chromosome start end gene depth log2 chr1 12050 12277 LOC102725121,DDX11L1 3.66079 1.87216
And in the cnr file I get the following:
chromosome start end gene depth log2 weight chr1 12050 12277 LOC102725121,DDX11L1 3.66079 -0.487125 0.371021
So according to the description above, the log2 value in the cnr file should be obtained by: log2(3.66079/3.46351). However, this results in approximately 0.0796, and not -0.487125 as shown above.
Thank you for your time and help in advance.
The 'depth' column is there for information and for convenience in filtering out no-coverage bins. CNVkit does this instead:
So what log2 value in the cnr file implies a potential gain? Is it a positive value (>0.0) or a value above 1.0?
Eric, I have a problem where I cannot seem to agree with the results from the cnr/cns data against what I see on IGV. For your information, I am using some matched tumor/normal WES data.
Here is a screenshot of MYC in one of the samples using cnvkit.py scatter
Here is a IGV screenshot of the same sample in MYC
As you can see in the cnvkit.py scatter plot there are positive log2 values in MYC (though below 1), but the same region seems to be a gain on IGV (top track is normal and bottom track is tumor; y-axis is 450 in both tracks). Is this just an over-segmentation problem?
Thank you for your response, as always, Eric!
I have a general question regarding to the log2 ration of normal reference samples. I followed the suggestions in documentation to filter the noisy normal samples by doing reference, coverage, segment, and finally I did the scatter using ONLY normal samples. I am a bit confused about the log2 ratio in the resulting cnr, cns files and the scatter plot.
My understanding is that the log2 ratio is the log2 ratio of (read counts from normal) and (read counts from disease samples). But in case of only normal samples used, what does the log2 ratio mean? how was it calculated? Thanks