Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Allow use of Ensembl chain #133

Closed jonathangriffiths closed 1 year ago

jonathangriffiths commented 1 year ago

Nice work with this package - it is extremely useful.

As you've noted, the UCSC chain files require a licence for commercial use. Fortunately, Ensembl also provide chain files. The changes I have made allow use of the Ensembl chain files for liftOver.

I haven't done two things:

  1. Remove the UCSC chain files included in the package
  2. Add tests for UCSC/Ensembl result equality. I'm not sure if the chain files are meant to give exactly the same results, so a mismatching mapping might not be incorrect. Though I did compare the results for the two chains for a single GWAS I downloaded, and the results were identical.

Let me know what you think!

Jonny.

Al-Murphy commented 1 year ago

Hey Jonny, thank you very much for this, it is a great help that is much appreciated! I will merge this with the master branch (dev version). The only change I will make is to remove the UCSC chain files from the package and just add a parameter so users can point to them if they are downloaded or download them directly from UCSC otherwise. I think this will avoid the licensing issues. I will however set the default to use the Ensembl versions by default, I understand there could be differences but I don't think we can say one is the 'right' choice over another. Just mentioning the related issue: https://github.com/neurogenomics/MungeSumstats/issues/128