dhimmel / snplentiful

SNP abundance correlates with network degree
https://doi.org/bffr
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

Diagnosing a major and unexpected change in the HH550 plot #3

Closed dhimmel closed 9 years ago

dhimmel commented 9 years ago

HH550 through b795aa195a2c2f35bafb1e6b33c4819bff30d0a7

HH550 beginning on 5238adf66c9a2566e04ca81666fa6b4d47394fa1

Problem

e783af22b660ad956cf9ea87994ece7e12ff3360 fixed a bug which was leading to all genes on chromosomes X and Y to receive 0 SNPs. The effect on 500K, HO1, and ExAC were minimal. However, the relationships for HH550 changed drastically with very low degrees being estimated for low SNP abundance.

dhimmel commented 9 years ago

In 3007ac01d7cc4a753c277a4fd5a46771f58b5ce0, we confirm that HH550 SNP abundances for genes in chromosomes 1-22 did not change. We confirm their consistency.

For chrX, 89% of genes had at least one SNP on HH550 in the revised dataset. For chrY, only 20% of genes had at least one SNP.

I find it unlikely that changes to a small number of genes could have such a drastic effect.

dhimmel commented 9 years ago

SNP abundance distributions before and after the drastic change

As expected, fixing the chrX and chrY bug led to fewer genes with 0 SNPs (notebook). This affected all platforms, so it is unclear why only HH550 would experience drastic change.

For reference, chrX contains 692 genes and chrY contains 20.

dhimmel commented 9 years ago

Expert diagnosis

We recreated the plots from the first post in python using seaborn. First, however we had to diagnose and fix a seaborn bug (https://github.com/mwaskom/seaborn/pull/686) which was drawing scatterplots in addition to curves.

The result is shown below, using lowess curves and with the chrXY-void dataset on the top row and the corrected dataset on the bottom row (notebook):

So... our facets created in ggplot2 were incorrectly labeled. The drastic change from chrXY abundances is real but occurred for HO1 rather than HH550. These findings are consistent with the SNP abundance histograms, where HO1 underwent a drastic change with the chrXY bugfix.