Closed rbatorsky closed 6 years ago
Any sequencing reads that map to chrY samples can be treated as noise, either misaligned or amgibuously aligned to pseudogenes or the pseudoautosomal region.
Theoretically the log2 read-depth ratio on chrY for female samples normalized to a female reference is log(0/0) = NaN
. CNVkit also fills in missing log2 values with -20 (which then drifts a bit after GC correction and re-centering).
For practicality, CNVkit makes chrY haploid in a "female" reference, so regardless of the reference gender, chrY will show log2 values around 0 for male samples some arbitrary negative number for female samples -- the values that you see here are typical.
If your cohort is all female, it would be reasonable to just delete chrY from either your reference or each of your samples. Then chrY wouldn't show up in the resulting plots and tables.
Sorry,
Theoretically speaking is treating reads on Y in a female vs female-ref as noise rather correct. To my opinion one should be more strict and then that idea is only true for spurious reads (i.e. derived from capturing with WES due to affinity or contamination-like in WGS), mapping to non-PAR. Females with Y-sequences do exist, e.g. via a Disorder of (Sexual) Development. Or ‘normal’ females carrying a segment as a gain on another chromosome.
PAR1 and -2 can often be handled in software, although borders do breathe a bit upon underlying technique and software.
So, while indeed it is reasonable to exclude Y from your analysis, I would not do that as hiding it might also hide real leads to results. But prob that depends a lot on your research question, while I am looking from the perspective of diagnostics.
Best, Jasper
From: Eric Talevich [mailto:notifications@github.com] Sent: 16 February 2018 02:36 To: etal/cnvkit Cc: Subscribed Subject: Re: [etal/cnvkit] What are the expected log2 values on chrY for female sample vs female reference (#318)
Any sequencing reads that map to chrY samples can be treated as noise, either misaligned or amgibuously aligned to pseudogenes or the pseudoautosomal region.
Theoretically the log2 read-depth ratio on chrY for female samples normalized to a female reference is log(0/0) = NaN. CNVkit also fills in missing log2 values with -20 (which then drifts a bit after GC correction and re-centering).
For practicality, CNVkit makes chrY haploid in a "female" reference, so regardless of the reference gender, chrY will show log2 values around 0 for male samples some arbitrary negative number for female samples -- the values that you see here are typical.
If your cohort is all female, it would be reasonable to just delete chrY from either your reference or each of your samples. Then chrY wouldn't show up in the resulting plots and tables.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/etal/cnvkit/issues/318#issuecomment-366120247, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIfPvscF-3l7vIFFYq9LPG3inG6Guvuaks5tVNudgaJpZM4SHa_L.
Thanks very much for the helpful response. I just have a clarifying question about "CNVkit makes chrY haploid in a 'female' reference". From the documentation sex.rst, I understand that by default chrX is considered diploid, and male reference samples have coverage on X doubled to resemble a diploid X. How are coverages scaled in female samples to resemble haploid Y? Are depth values in chrY bins doubled? Thanks again.
When building the reference, the chrY values from apparent female control samples are all replaced with -1. This makes it possible to normalize a chromosomally male test sample (i.e. any containing a real chrY) to a reference built from all chromosomally normal female samples, and addresses @JspSrs's other caveats.
I agree it's surprising, but the alternatives all seem worse in one way or another. In the development/upcoming version of CNVkit, the documentation no longer refers to the reference "sex", and instead just describes the option of whether chrX should be haploid in the reference.
Hello, I'm using cnvkit v0.9.1 and I am confused about the negative log2 values for chrY that I get when comparing a female sample to a female reference.
I have built a female reference using a cohort of 20 female WES samples:
Then, I'm running individual samples from this cohort against this reference using this reference using batch mode:
I typically see segments with negative log2 values, and I am concerned that I do not have the right method.
The scatter plot looks like this:
The inferred copy number is always zero, as expected for chrY for female, but I expected log2 values ~0, instead they are large and negative. Thanks for any insight and for a great tool!