bio-tools / biotoolsSchema

biotoolsSchema : Tool description data model for computational tools in life sciences
Creative Commons Attribution Share Alike 4.0 International
36 stars 12 forks source link

Multiple EDAM concepts needed for a single output + operation|data|format HANDLES #2

Open matuskalas opened 8 years ago

matuskalas commented 8 years ago

Yet another example where multiple concepts are needed for 1 output is Meta-pipe, generating annotation of (meta)genome assembly (contigs) with found protein-coding genes, protein domains, and information about those, such as taxa, DB hits scores, etc. The 1-only chosen type of data "Protein features" is very far from this in its generalisation, isn't it?

matuskalas commented 8 years ago

In addition, we need to distinguish the single output annotated with multiple concepts (i.e. containing various types of data in 1 output), as opposed to multiple outputs of a given operation.

The mess would be cleared up after re-introduction of operation, data (parameter), and format "handles". These should be employed with clear instructions avoiding nonsense population, such as something like "Operation name, command, button, menu item, parameter, or switch/flag".

matuskalas commented 6 years ago

Giving this a nudge ;-)

joncison commented 6 years ago

Now in biotoolsschema_dev.xsd

Operation handle introduced ascmd under function which is the right place for it, in 1st instance (i.e. the command-line fragment or option, that specifies to run the tool in this mode.) capture

joncison commented 6 years ago

As for the other issue (multiple - but currently single - data for a given I/O) there are two cases to distinguish.

  1. Where we have an input or output which can be specified in two ways, say a sequence which can be given as raw sequence or via an identifier. The current advice (http://biotools.readthedocs.io/en/latest/curators_guide.html#data-type-input-and-output-data) is to just specify one (the data term, not the data->identifier). I'm open to changing that, via a trivial update to the model, which is currently: capture

but could be: capture

  1. The second case is where we have complex data (but in a single file or blob) which could be described by more than one EDAM concept. I'm loath to support fine-grained annotation of such, because I think it opens a can of worms in various ways, and also would create a usability issue if we went with option 1. above (how to disambiguate these two cases ??) I think the general advice (point 3 of http://biotools.readthedocs.io/en/latest/curators_guide.html#id13) holds, in the example in the OP, the most specific term (currently at least) would be Genome report or Sequence report.

cc @hansioan @matuskalas : what do you think?