More ancestries as download option? - Githubissues

Cloufield / gwaslab

A Python package for handling and visualizing GWAS summary statistics. https://cloufield.github.io/gwaslab/

GNU General Public License v3.0

119 stars 22 forks source link

More ancestries as download option? #48

Closed swvanderlaan closed 10 months ago

swvanderlaan commented 10 months ago

Is it possible to also add the other ancestral groups to the download?

SAS
AFR
AMR
ALL or PAN for all the populations (handy for multi-ancestral populations GWAS) and/or a function that would enable to combine a given set of downloaded references (e.g. EUR and EAS) into one reference for GWAS with a specific number of ancestral groups

swvanderlaan commented 10 months ago

Happy to help create the datasets if you tell me how to exactly do this?

Cloufield commented 10 months ago

Hi, Thanks for the suggestion! Actually the raw 1KG dataset was processed using the code as described in https://cloufield.github.io/gwaslab/Reference/#1000-genome-projecthg19. You can easily prepare the dataset for your purpose. I will update the datasets for the population you mentiened soon (previous the datasets were hosted on Dropbox with limited storage and recently we upgraded the storage).

swvanderlaan commented 10 months ago

Thanks. I am making those files now, as per instruction. Does take a while... :-)

swvanderlaan commented 10 months ago

Ok. I got them all, except for the PAN/ALL dataset, i.e. the one including all 1000G variants to use as a reference when doing a trans-ancestry analysis. It takes up a lot of intermediate/temporary space for the last step (merging) which I don't have at the moment. Did you happen to upload that one, by any chance?

Cloufield commented 10 months ago

Hi, Sorry for the late reply. I have updated the reference datasets including all populations and PAN datasets. (Indeed it took very long and a lot of space to merge the datasets...) Since the links for Dropbox have changed, I have also updated the parsing. To download the new datasets, please update to v3.4.24.

Please note that the PAN datasets are large (>10GB)

swvanderlaan commented 10 months ago

Oh wow! Thanks a lot!