freeseek / score

Tools to work with GWAS-VCF summary statistics files
MIT License
94 stars 6 forks source link

integration as default bcftools plugins #4

Open BEFH opened 5 months ago

BEFH commented 5 months ago

Is there any plan to get these plugins integrated into the defaults for bcftools so we do not have to separately compile with them?

freeseek commented 5 months ago

I definitely want to make these tools as easy to use as possible so that is something I would be interested in. But I would not want to ask for that until the tools have converged to a stable state with limited further updates needed. For now the easiest solution is likely going to be to use the binaries available here

jjfarrell commented 4 months ago

@freeseek Creating a docker image with bcftools and adding these binary plugins installed would make it very easy to use for those developing Nextflow and WDL pipelines for both HPC and cloud environments.

freeseek commented 4 months ago

You can find docker images here. The bcftools images include BCFtools/gtc2vcf, BCFtools/mocha, and BCFtools/score but do not include CHOLMOD. The pgs images also include CHOLMOD if you want to run BCFtools/pgs

jjfarrell commented 3 months ago

Thanks @freeseek! I have downloaded the the docker image with singularity and software looks llike a great tool for organizing our gwas results. It would be helpful for new users to add docker info to the installation section with some examples:

singularity pull docker docker://mrcieu/gwas2vcf

jjfarrell commented 3 months ago

One comment on the docker image. Could R/Rscript with the various libraries needed for running the assoc_plot.py be installed in the docker image. That would allow the Manhattan plot to be generated from the docker image. Now an error message runs that no Rscript command is found.

freeseek commented 3 months ago

Including R libraries increases the size of the docker image dramatically so I try to keep those separate. However, if you look here you will find r_mocha images that include all the R libraries to run scripts such as assoc_plot.R

dlaehnemann commented 2 months ago

Just to add this for reference, here's how to create / integrate an official bcftools plugin: https://samtools.github.io/bcftools/howtos/plugin.api.html

But from a quick glance at your code, I see that you for example already implement process and destroy. So inclusion should be rather quick and easy.

Also, I remember the process being smooth and friendly when I contributed a very small plugin a long time ago: https://github.com/samtools/bcftools/commit/2f4a2b232103bffe673c7eb7f9e2e0304fb55af6

And with devs usually responsive, I would also expect bug fixes to be pretty straightforward. And once it is released through bcftools, users can for example directly use the plugins via the bioconda package of bcftools. This should seriously increase usage, and hopefully also things like citations! ;)

freeseek commented 2 months ago

Yes, I am not excluding that down the line. But at the moment I do not want to burden myself and the main developer of BCFtools with additional code that is still actively developed and that might still require multiple updates. The BCFtools/scatter plugin is an example of a plugin I have contributed directly to BCFtools as I don't expect further updates to it. Are you thinking of BCFtools/liftover or all the plugins in this repository?

dlaehnemann commented 2 months ago

I landed here while researching the latest liftover tools for a knowledge base and really like the liftover paper. So as you guessed correctly, that would be what I would be most interested in, as I would definitely use it whenever needed and would recommend it in the knowledge base.

And I haven't looked at the other plugins, but generally any functionality that is general is always nice to be able to just install via (bio)conda. :D

dlaehnemann commented 2 months ago

Just for cross-reference, here's the documentation I was talking about: https://github.com/koesterlab/data-science-for-bioinfo/pull/36

And this is where the info ends up: https://koesterlab.github.io/data-science-for-bioinfo/reference_data/liftover.html

It's an open resource, so feel free to point out any errors (or even contribute to the resource in general).