galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

[Feature] Split Large Collections with Integer #8330

Open DarianHole opened 5 years ago

DarianHole commented 5 years ago

Hi all,

I was asked by @Takadonet in regards to him being asked by multiple people from our lab about implementing a new collections tool that breaks apart large collections. This tool will take a collection and an integer and then split the collection into multiple datasets based on the integer.

Ex. A collection of 10 items and an input integer of 5 would produce 2 collections of 5 datasets

I have a baseline code that currently does this on my dev branch and I was wondering what others thoughts were on this situation before I create any PRs and go further with it.

Thanks for your time in advance

bernt-matthias commented 5 years ago

What's the rationale of such a splitting operation?

DarianHole commented 5 years ago

Hi @bernt-matthias,

Sorry for the slow response but I believe that some of our users wanted it in cases where they do an analysis on a larger dataset of say 1000 samples and then want to take subsets of that collection to run different tests on and/or compare the subsets instead of downloading the data again.

That or I was asked for it for some phylogenetics but I am not sure in that case what they wanted it for.

I can also create it in the toolshed if you feel that that is more appropriate.

Thanks

bernt-matthias commented 5 years ago

Depending on the use case it might be an option to create a nested list. I guess (not sure) this would be useful if splitting is used for separate processing.

I can't answer if TS is better.

Anyway, please go ahead and submit a PR here or at the IUC...

DarianHole commented 5 years ago

Done pending comments/critiques