Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 15 forks source link

SNPlocs.Hsapiens.dbSNP156.GRCh37 #167

Closed AndyYang0924 closed 2 months ago

AndyYang0924 commented 1 year ago

Hi, how can I access to the SNPlocs.Hsapiens.dbSNP156.GRCh37 or SNPlocs.Hsapiens.dbSNP156.GRCh38. Thank you!

Al-Murphy commented 1 year ago

Hey Wenjun,

I don't believe Bioconductor versions of dbSNP 156 have been created yet - @hpages may have more information but I know it took time to create dbSNP 155 so I'm not sure of the timeline for this. Sorry Hervé, do you have any thoughts on this?

Thanks, Alan.

hpages commented 1 year ago

Hi Alan, Wenjun,

It might take a while before I get to produce the SNPlocs.Hsapiens.dbSNP156.* packages. The approach I'm currently using for generating the SNPlocs packages has reached its limits and doesn't scale well with the ever increasing size of dbSNP. So it would need to be revisited e.g. by splitting the whole thing into smaller packages or by moving the data to AnnotationHub or both. It might take a while before I get to this.

In the mean time, if you really need SNPlocs.Hsapiens.dbSNP156.GRCh37 now, you can try to forge it by using the scripts provided in the SNPlocsForge package here. The package lacks documentation, sorry. The scripts for dbSNP156 are in inst/scripts/dbSNP156/. You first need to manually create the shell of the SNPlocs.Hsapiens.dbSNP156.GRCh37 package (use the SNPlocs.Hsapiens.dbSNP155.GRCh37 package as a template). Then run the following scripts in that order: download_json.sh, extract_snvs_from_RefSNP_json_files.sh, select_GRCh37_snvs.sh, build_GRCh37_OnDiskLongTable.sh.

Note that you'll need a powerful Linux machine to run these scripts (I used a machine with 80 logical cpus and 384 Gb of RAM to forge the SNPlocs.Hsapiens.dbSNP155.* packages, and the scripts took about 1 week for each package). You'll also need a lot of disk space (300 or 400 Gb or something like that).

Let me know if you decide to give it a try and I'll do my best to help.

Best, H.

Al-Murphy commented 1 year ago

Hey Herve,

Thanks very much for the explanation, this is not something I have time/resources to do right now but I do believe it's important to find a more manageable way to produces these packages with subsequent releases. I'll get in touch with any suggestions on how to do this in the future.

Cheers, Alan.

Al-Murphy commented 1 year ago

Let's leave this open for now since it has not been addressed in any meaningful way

Al-Murphy commented 2 months ago

Added a work around for this which will be added as a feature soon. See here: https://github.com/Al-Murphy/MungeSumstats/issues/191