galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Code Files / Retrieving a Dynamic List of Options #2712

Open hexylena opened 8 years ago

hexylena commented 8 years ago

I can't remember where I came across this but I read somewhere that the <code> file tag is deprecated and I was just wondering if anyone knew what it was being replaced with? I use it to get a list of items from a remote service to display in a select list for two of my tools.

I have a similar issue to that with Katherine raised on the mailing list. We discussed code files briefly at the IUC meeting at GCC and a number of devs admonished quite strongly that we should not use them, instead we should provide examples which we can use to build that functionality.

We have dynamic, remote services which can offer up lists of options that would go in a select.

A concrete use case is my apollo tooling:

peterjc commented 8 years ago

Currently the Galaxy validators only work on one parameter at a time. Thus my use case in the MIRA 4.0 assembler wrapper:

https://github.com/peterjc/galaxy_mira/blob/master/tools/mira4_0/mira4_de_novo.xml#L10

...
    <code file="mira4_validator.py" />
...
                    <!-- min/max validation is done via the <code> tag -->
                    <param name="min_size" type="integer" optional="true" min="0" value=""
                           label="Minimum size of 'good' DNA templates in the library preparation"
                           help="Optional, but if used you must also supply a maximum value." />
                    <param name="max_size" type="integer" optional="true" min="0" value=""
                           label="Maximum size of 'good' DNA templates in the library preparation"
                           help="Optional, but if used you must also supply a minimum value." />

This calls https://github.com/peterjc/galaxy_mira/blob/master/tools/mira4_0/mira4_validator.py which requires have both max and min (with 0 < min < max), or neither of these optional parameters.

hexylena commented 8 years ago

@peterjc mind if I moved your comment into a separate issue?

bgruening commented 8 years ago

@erasche for your special use-case I would argue that a data-source for apollo would be the way to go, very similar to the UCSC one.

hexylena commented 8 years ago

@bgruening they need to be workflow compatible. Right now they are. If they became a data source this would break this useful feature.

(If a feature for this existed, I could have a selector between "interactive" and "workflow compat, manual name specification" modes)

bgruening commented 8 years ago

If this is part of an workflow, this workflow is not reproducible as your input source can change at any time. I see you use-case, I just argue that data-inputs should come as first step. For example Alice should get all ID's of a organism from Apollo via a data-source and your workflow should extract to correct one inside the workflow.

peterjc commented 8 years ago

@erasche Sure, we can move my example to a separate issue for use cases for the <code> tag.

hexylena commented 8 years ago

Thanks @peterjc, we all have lots of (ab)uses for this tag, glad people are finally documenting them :) (Whenever I finish drafting this reply I'll move yours if you haven't already.)

@bgruening Yes, this is not reproducible within a workflow. Adding an organism would mutate the list that would be retrieved. From a "must be perfectly reproducible" this is not good, but from a UX standpoint (which I often find more important than technical perfection), I think this would be extremely beneficial, even over a data source/fetch data step.

Fetch data:

Unreproducible dynamic list:

nsoranzo commented 8 years ago

@erasche At the IUC meeting we decided to collect all these use cases of <code>, it may make sense to have them all here or create a tracking issue.

hexylena commented 8 years ago

@nsoranzo will launch a tracking issue. I think they are varied enough to separate and solve individually (or at least that seemed to be the devteam opinion)

nsoranzo commented 8 years ago

@erasche Cool, thanks!

nsoranzo commented 8 years ago

Somehow similar use case (dynamic options), but reproducible is https://github.com/galaxyproject/tools-devteam/blob/master/tools/cummerbund/cummeRbund.xml , where options are retrieved from a SQLite dataset.

hrhotz commented 8 years ago

I don't wanna make enemies here, I just wanna put in my two pennies worth:

I know, the use of the code tag is bad, and it might result in non-reproducible cases. However,it is a very convenient and easy way of connecting Galaxy with existing infrastructure. And I am sure, there are a lot of Galaxy admins out there (managing a local/private server) using this feature.

In a perfect world, we would just use Galaxy for everything. Unfortunately, Galaxy still faces a lot of opposition. One powerful way to promote Galaxy and convince people to put resources into Galaxy is by showing how easily it can interact with data already available, like meta data stored in a LIMS or data files in user-directories.

I am open to and looking forward to replacements for the code tag, but for now, it is essential - at least for me.

bgruening commented 8 years ago

@hrhotz no worries. I think everyone of us has it's use-cases for the code-tag. This was the reason the IUC discussed this at the last meeting. The point of this issue is to collect use-cases why we need the code-tag and implement a better approach so solve this case into the Galaxy tool syntax. For example @peterjc use-case could be implemented with a new special validator that can operate on multiple parameters.

Please feel free to add you use-case here :)

peterjc commented 8 years ago

Thanks @erasche for filling the tracking issue #2714 for <code> use cases, and #2713 for my dual-parameter validation example.

abretaud commented 8 years ago

I have a very similar use case as @erasche for tripal/chado:

-Bob uses tool A to create an organism record in a sql db. It gets inserted with an auto-increment id -He needs to input the new organism id in tool B to insert new data referencing it

Having a select box with organism names and ids from the sql db (or rest api) would be great, in fact users don't really need to ever know the organism X has the id 12

hexylena commented 8 years ago

Here is another use case that I have for this:

We have a tool that wraps sequin/tbl2asn, which require inputs like "Name of Author" and "Name of Record Owner", "Name + address of institution".

We currently use a data table with this with a regular export from our LDAP server, but it is ugly. I would much prefer being able to write some small gateway between the galaxy tool and LDAP, or to just use LDAP queries within the input box.

On this note, I think this specific code file issue should include both a select2 box, and search functionality handling when there are more results than we really want to render (I have a tool with some 3k+ options in a select, it loads very slowly for obvious reasons)

@abretaud if I forget, remind me that I have some chado galaxy tools already written that I need to share.

abretaud commented 8 years ago

IIRC @yvanlebras had a use case with a (long) list of docker images fetched from bioshadock

(@erasche great, looking forward to test them! I'm polishing/putting online some tripal stuff, more news very soon)

osallou commented 8 years ago

Maybe Galaxy could add support for web components with libraries like Google Polymer to add templated elements. Templates could be added like runner plugins, and defined in xml tool definition with specific tags mapping to a web component template. This would load a template with HTML+javascript doing specific treatments and mapping result to specified tool input.

osallou commented 8 years ago

@abretaud @erasche I have create a PR #2751 to support web components, allowing custom HTML+CSS+JS in templates (so you could define a dropdown that queries a remote service and sets the galaxy input param. It can't be merged yet, i think it will needs quite a lot of discussion, but the POC works.

bgruening commented 5 years ago

Similar use-case reported during the ELIXIR Galaxy workshop in Roscoff.

We would need something similar to data_column but for rows. A dynamically populated list of rows-names from a given input file.