etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
540 stars 164 forks source link

how can I analyse gender spesific CNV analysis with CNVkit? #635

Open enes-ak opened 3 years ago

enes-ak commented 3 years ago

Hi,

Recently I am working on CNV analysis with CNVkit tool.

For now, I analyse CNV without use gender parameter. So if my sample come from a woman, my results include loss of Y chromosome. This is not logical.

As far as I know we can use gender parameter while plotting CNV diagram.

Like that:

but I want to add this parameter at the beginning of the CNVkit pipeline. Is there any way to do that?

Also I confuse about the weight column. what if weight value is great and log2 value is close to zero (for example 0.2) or vise versa? How can I interpret that situation?

The last question is about diagram plot: Some fields are represented as soft blue/red and some fields are looks like really dark red/blue. How should I interpret that? I guess soft ones are not reliable and dark ones are reliable results. Probably this feature come from log2 ratio but if someone give me more details about that I will be glad to learn that.

Thanks.

tskir commented 3 years ago

Hi @enes-ak,

For now, I analyse CNV without use gender parameter. So if my sample come from a woman, my results include loss of Y chromosome. This is not logical.

Sex detection in CNVkit has a history of issues; however, in general, this should not happen. The batch command should correctly identify which samples are male/female, and process them correctly, all the way to the end. When you say that the results include loss of the Y chromosome, can you please tell me at which processing step do you observe this? I.e., which file/diagram appears to contain loss of chrY for female samples?

but I want to add this parameter at the beginning of the CNVkit pipeline. Is there any way to do that?

Just to make sure I understand you correctly, do you want to separate the samples into male/female and do two cnvkit.py batch runs with all male and all female samples separately? This is in general not recommended, because the more samples you have in the pooled reference, the more accurate it will be. However, if you absolutely must do this, you can play with the --male-reference parameter which specified how chrX/chrY coverages are stored in the reference files.

Also I confuse about the weight column. what if weight value is great and log2 value is close to zero (for example 0.2) or vise versa? How can I interpret that situation?

The weight parameter specifies an esimate on how accurate log2 values are. You can think about that as a standard deviation of a value (in reality, a more complicated metric is used, but along the same lines).

So, regardless of the log2 value, low weight means low quality of the estimate and high weight means high quality.

The last question is about diagram plot: Some fields are represented as soft blue/red and some fields are looks like really dark red/blue. How should I interpret that?

Red means copy number gain and blue means copy number loss compared to normal.

Please let me know if you have any further questions, I'll be happy to answer them

enes-ak commented 3 years ago

Hi @tskir, I get the result that whole Y is deleted in the cns file. Then I plot cns and cns.call files, I see that result:

Screenshot from 2021-07-05 10-25-54

The sample may be from a woman, but it's still confusing to see this result in this way. Isn't it?

Actually for now, I am analysing samples one by one and I do not use batch command. Just I want to give gender information into the beginning of the pipeline but now I give gender information only at diagram command. Is there any way to use gender information at the beginning of the CNVkit pipeline.

Thanks for answer question about Weight, now it is more clear for me.

I know gain is represented as red and loss is represented as blue but in the diagram plot some regions are dark red some regions are soft red or vise versa. What is the meaning of that? Where come darkness and softness from? Probably dark region results are more reliable than soft regions, how should I interpret that feature?

Thanks! Enes

tetedange13 commented 3 years ago

Hi @enes-ak,

It is simply a diverging colormap => Each log2 value is mapped to a shade of color, so that a negative value is blue-ish and a positive value is red-ish => You will see that some colormaps like "seismic" in my link have "extreme" values mapping dark-red and dark-blue colors

CNVkit in particular uses a custom colormap, but idea is still the same: no reliability involved here as @tskir told you, simply log2 values that are either very negative (= dark-blue) or very positive (= dark-red)

It would be way easier for you if you were using batch, because as @tskir told you it will try to guess sex itself and produce coherent results (+ --male-reference parameter to play with) Otherwise, if you absolutely want to stick with "running each step one by one" approach, please read dedicated section from CNVkit's documentation for further details => Key steps were sex will matter should be cnvkit.py reference and cnvkit.py call I think

Have a nice day. Felix.

enes-ak commented 3 years ago

Thank you so much @tetedange13 ,

My last question is about cnvkit.py call command. The default calling copy number with thresholds: -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3.

If I didn't wrong, call command use these theresholds for assing discrete CN to each segment. When I checked, I saw that these values are not valid for male samples gender chromosomes. Because gender chromosomes of males are haploid. What is default threshold value for male sex chromosomes?

Haploid calling thresholds:

I calculate threshold myself with this formula : log2[(cn+0.5)/1] ---> 1 comes from ploidy

for cn = 0 (loss) ------> -1 for cn = 1 (no alteration) ------> 0.5849625 for cn = 2 (gain) -----> 1.321928 for cn = 3 (gain) ------> 1.807355

Are these thresholds true?

gorgitko commented 9 months ago

@tetedange13 @etal Hi, I am still a bit confused about the gender-specific analysis (and yeah, I have also read https://cnvkit.readthedocs.io/en/stable/sex.html). My panel also covers some regions of chr X and Y. In normal samples I have a mix of genders. Should I use the batch command several times like this:

Or alternatively:

Thanks in advance for clarifying this!

etal commented 9 months ago

@gorgitko Ideally it should work if you use a mix of male and female normal and tumor samples all together in a single batch. Watch the status logs to see whether the "detected" genetic sex of each normal and tumor sample matches what you expected. If they all match correctly, then your results should be fine. If the detected genetic sex is incorrect for any of the input samples, then you can try the analysis specifying the sexes explicitly.

Also consider the new functionality for handling PAR in the current development version of CNVkit: https://github.com/etal/cnvkit/pull/789

gorgitko commented 9 months ago

@etal Thanks for the reply. It's not easy to find this information in logs, so I have used the sex command. The accuracy is:

For these incorrect male tumor samples, I should rerun the batch command using the male normal reference and --male-reference?