bioconda / bioconda-recipes

Conda recipes for the bioconda channel.
https://bioconda.github.io
MIT License
1.61k stars 3.2k forks source link

A meta-package to install all UCSC kent tools? #43618

Open maximilianh opened 11 months ago

maximilianh commented 11 months ago

Hi @dpryan79, various people e.g. @diekhans and @mrvollger have asked if there was a conda package to install all the kent tools. I don't know how to figure that out. Maybe there is already a meta-package that pulls in all the kent-ucsc-tools? Or a command that I am missing? Thanks!

mrvollger commented 11 months ago

So I looked into it, and it is too large (~3.8GB) to install all at once, but I think we could have larger sets of related utilities that could be jointly installed. For example, all bed utilities and all bigBed utilities.

I made a PR that does these two here: https://github.com/bioconda/bioconda-recipes/pull/43664

maximilianh commented 11 months ago

Hi Mitchell, hm, is 4GB really too large these days? Interesting. My watch has almost 10 times more... :-)

Great to see the PR! Would you also want to add the tools hubCheck and the bigWig* tools? We have lists of tools by "importance" and the group may come up with somewhat more extended lists, e.g. anything important for making track hubs: faToTwoBit, bptForTwoBit, hubCheck, bedGraphToBigWig, wigToBigWig, fetchChromSizes, bigGuessDb, pslToBigPsl.

I personally use featureBits and overlapSelect a lot and for alignments, in addition to blat, a few other tools, e.g. pslReps and pslCDnaFilter, pslSort, pslMap, pslUniq, pslToChain, chainToAxt, chainToPsl, liftUp, pslPretty, pslPairs.

On Thu, Oct 19, 2023 at 5:27 AM Mitchell Robert Vollger < @.***> wrote:

So I looked into it, and it is too large (~3.8GB) to install all at once, but I think we could have larger sets of related utilities that could be jointly installed. For example, all bed utilities and all bigBed utilities.

I made a PR that does this here:

43664 https://github.com/bioconda/bioconda-recipes/pull/43664

— Reply to this email directly, view it on GitHub https://github.com/bioconda/bioconda-recipes/issues/43618#issuecomment-1769832794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKYLT6E4NQRCJJAJXDYACMZDAVCNFSM6AAAAAA54LKM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZHAZTENZZGQ . You are receiving this because you authored the thread.Message ID: @.***>

diekhans commented 11 months ago

Maybe the limit could be appealed?

On Thu, Oct 19, 2023 at 12:06, Maximilian Haeussler < @.***> wrote:

Hi Mitchell, hm, is 4GB really too large these days? Interesting. My watch has almost 10 times more... :-)

Great to see the PR! Would you also want to add the tools hubCheck and the bigWig* tools? We have lists of tools by "importance" and the group may come up with somewhat more extended lists, e.g. anything important for making track hubs: faToTwoBit, bptForTwoBit, hubCheck, bedGraphToBigWig, wigToBigWig, fetchChromSizes, bigGuessDb, pslToBigPsl.

I personally use featureBits and overlapSelect a lot and for alignments, in addition to blat, a few other tools, e.g. pslReps and pslCDnaFilter, pslSort, pslMap, pslUniq, pslToChain, chainToAxt, chainToPsl, liftUp, pslPretty, pslPairs.

On Thu, Oct 19, 2023 at 5:27 AM Mitchell Robert Vollger < @.***> wrote:

So I looked into it, and it is too large (~3.8GB) to install all at once, but I think we could have larger sets of related utilities that could be jointly installed. For example, all bed utilities and all bigBed utilities.

I made a PR that does this here:

43664 https://github.com/bioconda/bioconda-recipes/pull/43664

— Reply to this email directly, view it on GitHub < https://github.com/bioconda/bioconda-recipes/issues/43618#issuecomment-1769832794>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACL4TKYLT6E4NQRCJJAJXDYACMZDAVCNFSM6AAAAAA54LKM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZHAZTENZZGQ>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/bioconda/bioconda-recipes/issues/43618#issuecomment-1770491146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA23ZOWAJZUYCAZADDIH4DYAD3TFAVCNFSM6AAAAAA54LKM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZQGQ4TCMJUGY . You are receiving this because you were mentioned.Message ID: @.***>

mrvollger commented 11 months ago

@dpryan79 and @johanneskoester, is it possible to have the resource limit increased so we could make this one package? It would be really useful, I think.

bgruening commented 10 months ago

I don't think this is possible at the moment. Its a lot of data if every CI job wants to allocate that. We could try disabling the container tests and builds, that will reduce storage needs ... maybe I'm just building this meta-package locally for you. Its a one-time thing, isn't it? That always pull the lastest kent tools?

peterjc commented 7 months ago

How many misc tools are there outside of all bed utilities and all bigBed utilities?

e.g. isPcr

mrvollger commented 7 months ago

I think there are a little over 300 tools in the Kent/ucsc utils package.

(and 42 of those have {bed,Bed} in the name)

maximilianh commented 7 months ago

I think most of them get very little use and I bet that we can cover 90% of uses with around 30-40 tools

On Wed, Jan 24, 2024 at 4:23 PM Mitchell Robert Vollger < @.***> wrote:

I think there are a little over 300 tools in the Kent/ucsc utils package.

— Reply to this email directly, view it on GitHub https://github.com/bioconda/bioconda-recipes/issues/43618#issuecomment-1908352201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TPMHE7VQTAVL2QR4S3YQERNXAVCNFSM6AAAAAA54LKM5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBYGM2TEMRQGE . You are receiving this because you authored the thread.Message ID: @.***>

mrvollger commented 7 months ago

I think that is true. However, if we don't bundle them in some way then we might need 100s of different bioconda PRs everytime there is an update from your team (UCSC).

I am not against making a couple of PRs to split the popular tools and then all the others or some combination.

maximilianh commented 7 months ago

I think that is true. However, if we don't bundle them in some way then we might need 100s of different bioconda PRs everytime there is an update from your team (UCSC).

Sorry don't know what you mean. Isn't this ticket about bundling them?

I am not against making a couple of PRs to split the popular tools and then all the others or some combination

Let me reformulate this: I'm pretty sure that there are a lot of tools that are used by 0 pipelines out there.

But maybe it's easier just to increase the limit for once for this package and waste some storage and break a rule, but save a lot of time?

Message ID: @.***>