Closed nekrut closed 3 years ago
While I agree on the need for such functionality in a worst case, I do think that in this case the more correct thing would be to create two collections for your signal and control. Imho we should only aggregate files that belong functional together.
then we really need to allow multiple select for collections in tools, so you can run on multiple collections
Record types help a lot here. In my CWL chipseq workflow I have a list of replicates, each of which is a record of treatment and control, each of which is a (optionally paired) fasts. I think this leads to the most natural representation of the workflow.
@jmchilton and I have discussed this and he is going to rough out an idea of record types for Galaxy (which would presumably subsume the current "paired" collection.
On Tue, Apr 4, 2017 at 5:32 PM Björn Grüning notifications@github.com wrote:
While I agree on the need for such functionality in a worst case, I do think that in this case the more correct thing would be to create two collections for your signal and control. Imho we should only aggregate files that belong functional together.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/galaxyproject/galaxy/issues/3870#issuecomment-291638304, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE4ZRtAxZak5JBFFpZRHxahPjzbGczpks5rsrb-gaJpZM4MzdJ3 .
Aha, yes, but in the short term splitting collection would be nice
But last resort ;)
(Because it is hard/impossible to do reusably or reproducibly. You don't know which elements are treatment and which are control when you explode a collection interactively, so you can't make a workflow, record types address this)
No clue about the client side, but I think selecting multiple collection at once would help you here more than this last resort
tool.
Yes indeed. so multiple select then
Oh, don't close, last resort but still worth having. On Tue, Apr 4, 2017 at 5:52 PM Anton Nekrutenko notifications@github.com wrote:
Closed #3870 https://github.com/galaxyproject/galaxy/issues/3870.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/galaxyproject/galaxy/issues/3870#event-1029251143, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE4ZQcvYtVHI09_mRMSMF555Yhr0AWVks5rsrubgaJpZM4MzdJ3 .
Similar to https://github.com/galaxyproject/galaxy/issues/740, any solution for this would solve my issue as well :D
We find having the ability to eject or split would greatly improve our biologist ability to do their work. Specific example would be the SNVPhyl workflow where all samples are used in a single analysis. https://snvphyl.readthedocs.io/en/latest/ , Sometimes the only way to know if one or more samples have to be removed from the collection is at the end of the workflow.
The end user has to then re-make the collection without those samples and re-run. Issue is that sometimes they have to remove dozens or hundreds by hand. Since paging was added (so happy for that!), it makes almost impossible to select all the files again if there is more then 500 files in total.
I don't think the 'eject' tool should be ability in workflow execution but it should be available.
@Takadonet Can you use the filter failed tool to automate this? Or put another way - how are users selecting these datasets?
@jmchilton . Based on the output results from either a phylogenomics tree or based on values in secondary dataset. Example be all sample that have less then 60% identity to the reference should be removed.
@Takadonet Can you implement a tool that will just fail outputs that don't meet these criteria and then use the "filter failed" tool?
If it makes sense for your workflow to have a human involved - that is totally - but I'm always looking for guinea pigs to utilize new workflow functionality.
@jmchilton Seems to me that both cases would be needed. One case would be where human involvement is used and to me should be the same interface as creating a new collection so it is consistent.
Other case should be in a tool that is similar to the ones already in the base Galaxy codebase. i.e merge collection, unzip, zip etc... No point having a normal toolshed because of the duplication of datasets. Having the new tool available during a workflow execution would be awesome but difficult to implement for sure.
We are always up for being a guinea pigs!
@Takadonet Good points - I have a PR to add a filtering option that works without dataset duplication here https://github.com/galaxyproject/galaxy/pull/3940. Hopefully it will be in 17.05 - then all you would need to do is write a tool that looks at whatever metadata is interesting and builds a list of identifiers only of those you wish to keep.
@jmchilton Probably cherry pick into our current Galaxies ASAP. Got lots of users that would be interested for sure.
Sorry I'm a little lost between the multiple issues for this issue: What is the current status of being able to run tools on subsets of collections? If that isn't possible, is there a way to "eject" collections into a bunch of unique history items?
I would like to be able to copy a few datasets from a list of datasets. Specifically, I have a list of over mzML datasets, and I want to extract the dozen that represent the pooled samples. In the History UI, I can choose "Copy Datasets" and choose from the datasets in the history, but when I click on my list dataset so that its contents are revealed and the rest of the history is hidden (i.e., the history pane says "back to (my history)" and "a list with (count) items"), when I choose "Copy Datasets", it shows the datasets in the enclosing history.
Having "eject" would give me a workaround at least. Alternatively, if "Copy Datasets" worked for choosing members from list contents, then copying the members to the enclosing history would have the same effect as eject. Right now my only choice is to download (or find my original files) and upload.
@dannon I thought that it made better sense to comment here than to open a new issue since this seems so closely related.
@nekrut the phrasing in your original post was very interactive, so for this case is it now resolved with https://github.com/galaxyproject/galaxy/pull/7553?
I think this is mostly solved with the ability to filter by element identifier, and to interactively select in the tool form. I'm going to close this but please let me know if it's still not resolved and we should re-open.
In some cases it is necessary to gain access to collection elements individually. For example, in my ChIP-seq analysis I initially bundle all data (signal and control) together into a single collection to pre-process, map, and post-process. However, when I run MACS it requires me to load signal and control separately. To enable this it would be necessary to have one of these: