daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
297 stars 103 forks source link

Has pybedtools considered packaging bedtools? #401

Closed pettyalex closed 3 months ago

pettyalex commented 6 months ago

Hello,

Have you considered packaging bedtools with pybedtools such that it's installable and usable as a python package directly without having to separately install bedtools? It would also allow guaranteed compatibility in addition to easier installation.

I've seen many Python packages successfully distribute their native dependencies, such as https://github.com/pysam-developers/pysam and https://github.com/indygreg/python-zstandard among many others. bedtools is license compatible, also MIT, so it should be possible to include compilation of bedtools with the pybedtools. This would allow making complete, ready to use python wheels of pybedtools, making distribution through all sorts of channels including PyPI easier.

Thoughts? I'd be willing to investigate this and contribute to the work, but if the idea has been considered before and rejected I don't want to waste effort.

To give a little bit more context, the pain point that led me to create this issue was that the newest bedtools native build available on my institution's SLURM cluster's lmod is quite old and built on an ancient GCC toolchain, so using pybedtools in this environment would require using a quite old version of Python. Clearly the fix is to ask for a newer build of bedtools, but issues like these could be avoided entirely if pybedtools would bundle bedtools with itself and be installable from pip in a complete form.

daler commented 6 months ago

Thanks for the suggestion. In prior years, bedtools development happened rather quickly, so it made much more sense to decouple bedtools and pybedtools while supporting as many bedtools versions as practical.

Many (most?) users would install them both via conda, and have the flexibility to choose the bedtools version that worked best with the rest of their workflow. This is what my group does with our production workflows on the slurm cluster, for example.

The reason I'm still hesitant to include bedtools is that bundling it forces the user into a particular version of bedtools. Bedtools development is quite stable nowadays, but to confidently make the call to bundle, we'd need to survey what everyone's using.

The best information I know of for that is https://bioconda.github.io/recipes/bedtools/README.html, which still shows non-trivial amounts of a mixture of versions being regularly downloaded.

Bundling might be OK if there was a clear mechanism for toggling whether the bundled version should be used or an externally available version should be used. Having multiple versions floating around though might get confusing to some users.

So, I'm still on the fence here. If you want to take a stab at seeing what it would take to bundle then by all means go for it, but I'd want to discuss and think more about pulling the trigger on it.

In the meantime, is conda an option for you on the cluster?

dbolser commented 3 months ago

Assign a label, feature request?

daler commented 3 months ago

@pettyalex please reopen if you'd like to discuss further!