Open BEFH opened 5 months ago
I definitely want to make these tools as easy to use as possible so that is something I would be interested in. But I would not want to ask for that until the tools have converged to a stable state with limited further updates needed. For now the easiest solution is likely going to be to use the binaries available here
@freeseek Creating a docker image with bcftools and adding these binary plugins installed would make it very easy to use for those developing Nextflow and WDL pipelines for both HPC and cloud environments.
You can find docker images here. The bcftools
images include BCFtools/gtc2vcf, BCFtools/mocha, and BCFtools/score but do not include CHOLMOD. The pgs
images also include CHOLMOD if you want to run BCFtools/pgs
Thanks @freeseek! I have downloaded the the docker image with singularity and software looks llike a great tool for organizing our gwas results. It would be helpful for new users to add docker info to the installation section with some examples:
singularity pull docker docker://mrcieu/gwas2vcf
One comment on the docker image. Could R/Rscript with the various libraries needed for running the assoc_plot.py be installed in the docker image. That would allow the Manhattan plot to be generated from the docker image. Now an error message runs that no Rscript command is found.
Including R libraries increases the size of the docker image dramatically so I try to keep those separate. However, if you look here you will find r_mocha
images that include all the R libraries to run scripts such as assoc_plot.R
Just to add this for reference, here's how to create / integrate an official bcftools plugin: https://samtools.github.io/bcftools/howtos/plugin.api.html
But from a quick glance at your code, I see that you for example already implement process
and destroy
. So inclusion should be rather quick and easy.
Also, I remember the process being smooth and friendly when I contributed a very small plugin a long time ago: https://github.com/samtools/bcftools/commit/2f4a2b232103bffe673c7eb7f9e2e0304fb55af6
And with devs usually responsive, I would also expect bug fixes to be pretty straightforward. And once it is released through bcftools, users can for example directly use the plugins via the bioconda package of bcftools. This should seriously increase usage, and hopefully also things like citations! ;)
Yes, I am not excluding that down the line. But at the moment I do not want to burden myself and the main developer of BCFtools with additional code that is still actively developed and that might still require multiple updates. The BCFtools/scatter plugin is an example of a plugin I have contributed directly to BCFtools as I don't expect further updates to it. Are you thinking of BCFtools/liftover or all the plugins in this repository?
I landed here while researching the latest liftover tools for a knowledge base and really like the liftover paper. So as you guessed correctly, that would be what I would be most interested in, as I would definitely use it whenever needed and would recommend it in the knowledge base.
And I haven't looked at the other plugins, but generally any functionality that is general is always nice to be able to just install via (bio)conda. :D
Just for cross-reference, here's the documentation I was talking about: https://github.com/koesterlab/data-science-for-bioinfo/pull/36
And this is where the info ends up: https://koesterlab.github.io/data-science-for-bioinfo/reference_data/liftover.html
It's an open resource, so feel free to point out any errors (or even contribute to the resource in general).
Is there any plan to get these plugins integrated into the defaults for bcftools so we do not have to separately compile with them?