Open swvanderlaan opened 1 year ago
Hi, Thanks for the suggestion. Actually, I have already implemented functions for this purpose. (https://cloufield.github.io/gwaslab/Harmonization/#check-the-difference-in-allele-frequency) But currently this function is not optimized yet and it takes a very long time to run. I am still thinking about how to improve the implementation. I will let you know when I finish this.
It's probably okay to do this on a subset of a given dataset, for instance 10%. If the purpose is to get an idea of the quality of the data, I think that's just fine. Of course, it would be great to a get a table with the actual per SNP difference so that one can filter or otherwise take action.
Thanks for your comment. For a quick inspection of the data quality, I totally agree with you. Actually, this can be done by using random_variants() with the current version. I will integrate this into check_af soon.
# get 10000 random variants from the dataset.
sample = mysumstats.random_variants(n=10000)
# check and plot using the randomly selected variants
sample.check_af(ref_infer=gl.get_path("1kg_eas_hg19"), ref_alt_freq="AF",n_cores=2)
sample.plot_daf(threshold=0.12, save="af_correlation.png",save_args={"dpi":300})
Admittedly I haven't used it yet, but reading your paper it crossed my mind: is there a function to create a correlation plot of the allele frequencies from the given GWAS and the given reference?
If not, this would be very useful.