Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/
GNU General Public License v3.0
151 stars 25 forks source link

Frequency correlation plot #22

Open swvanderlaan opened 1 year ago

swvanderlaan commented 1 year ago

Admittedly I haven't used it yet, but reading your paper it crossed my mind: is there a function to create a correlation plot of the allele frequencies from the given GWAS and the given reference?

If not, this would be very useful.

Cloufield commented 1 year ago

Hi, Thanks for the suggestion. Actually, I have already implemented functions for this purpose. (https://cloufield.github.io/gwaslab/Harmonization/#check-the-difference-in-allele-frequency) But currently this function is not optimized yet and it takes a very long time to run. I am still thinking about how to improve the implementation. I will let you know when I finish this.

swvanderlaan commented 1 year ago

It's probably okay to do this on a subset of a given dataset, for instance 10%. If the purpose is to get an idea of the quality of the data, I think that's just fine. Of course, it would be great to a get a table with the actual per SNP difference so that one can filter or otherwise take action.

Cloufield commented 1 year ago

Thanks for your comment. For a quick inspection of the data quality, I totally agree with you. Actually, this can be done by using random_variants() with the current version. I will integrate this into check_af soon.

# get 10000 random variants from the dataset.
sample = mysumstats.random_variants(n=10000)

# check and plot using the randomly selected variants
sample.check_af(ref_infer=gl.get_path("1kg_eas_hg19"), ref_alt_freq="AF",n_cores=2)
sample.plot_daf(threshold=0.12, save="af_correlation.png",save_args={"dpi":300})