Open joncison opened 7 years ago
cc @hansioan @baileqi @ekry @matuskalas
I'm somewhat loath to change this, because there would be many knock-on consequences for the various UIs that adhere to the model.
What we have currently (which cannot cope with the input that is "raw sequence" or "sequence identifier scenario above):
From an XSD perspective it can easily enough be "fixed", thus:
Note you can now specify multiple pairs of data+format for a given input. But as I say, I'm loath to do so because of the knock-on effects (UIs, API ...) I'm probably, at this stage, leaning towards not making this change, but I'm not sure.
Thoughts please ...
Latest thoughts on this (and 90% sure to be included in biotoolsSchema 3.0.0 thus bio.tools) are here.
👍 Well, you know my thoughts on this, as they haven't changed :-) (see also related but different #2)
Still, I don't understand your suggestion (XSD change) in https://github.com/bio-tools/biotoolsSchema/issues/83#issuecomment-341667924 (i.e. https://user-images.githubusercontent.com/1506863/32369707-1e621a3e-c082-11e7-8ee8-2921dccb4f3f.PNG). (If you mean <xs:sequence maxOccurs="unbounded">
then I wouldn't suggest it as a simple hack, because it's generaly not recommended and hard to parse. And probably won't work in JSON at all.)
A couple of options, in order of sophistication:
Loath to change and fear of getting repetitive requests.
Add an option of an "OR" logic between inputs (and outputs). Implementable in various ways. (Let me think in the meantime about one or more simple ways.)
Allow multiple EDAM Data concepts for one input/output (fixing the related part of #2), and add a separate "OR" logic as mentioned one above.
I'd very much suggest either 1. or 3., i.e. either all, or nothing (and all in the future).
Implementation suggestions:
(without <xs:sequence maxOccurs="unbounded">
)
a) Simple option with a new mandatory element parameter
:
<xs:element name="input" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:choice>
<xs:sequence>
<xs:element name="or" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="parameter" type="dataType" minOccurs="2" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="parameter" type="dataType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:element name="parameter" type="dataType" maxOccurs="unbounded"/>
</xs:choice>
</xs:complexType>
</xs:element>
This option is unable to express (A and B) or (C and D) nicely, because of looking and behaving like the conjunctive normal form ;-) (A and B) or (C and D) <=> (A or C) and (B or C) and (A or D) and (B or D)
b) Cleaner option with a cleaner xs:choice
, and backwards compatible with the current schema, i.e. no new mandatory elements:
<xs:element name="function" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="operation" maxOccurs="unbounded">
...
</xs:element>
<xs:element name="input" type="dataType" minOccurs="0" maxOccurs="unbounded">
<xs:element name="output" type="dataType" minOccurs="0" maxOccurs="unbounded">
<xs:element name="or" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:choice>
<xs:element name="input" type="dataType" minOccurs="2" maxOccurs="unbounded">
<xs:element name="output" type="dataType" minOccurs="2" maxOccurs="unbounded">
<xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="comment" minOccurs="0">
...
</xs:element>
<xs:element name="cmd" minOccurs="0">
...
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Still, this can't express (A and B) or (C and D) nicely, as it's still based on the conjunctive normal form.
c) Or a super-clean, without xs:choice
but with 2 new mandatory elements, looking and behaving like the disjunctive normal form:
<xs:element name="input" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="option" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="parameter" type="dataType" maxOccurs="unbounded"/>
<xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
This enables expressing also (A and B) or (C and D) nicely. The only inconvenience to pay is that it forces to copy e.g. (A and B) or (C and B) or (D and B) if B is always mandatory. But from the user's point of view, this is not a problem, but rather a clear enumeration of the choices! The only inconvenience stays, that at least one overarching option
element has to be always added. Still, the cleanest ultimate solution!
<xs:element name="data" type="EDAMdata" maxOccurs="unbounded">
in dataType
and its xs:restriction
s to fix the semantic part of #2. I'm sure the backend and GUI fixes will be trivial, as that is allowed for both Operations and Formats (just not Data). In general, this is trivial compared to 2.Thanks a lot for this. We should give it more thought. I don't want to change anything for the next release, for fear of changing too many things all at once (esp. something at the core of the model, like this).
For now we have nice clear guidelines, and we can improve on things, most probably, as soon as the quality of the bio.tools entries has improved a bit and a more sophisticated approach is warranted.
Ok, @joncison.
Should we update at least the 3. (<xs:element name="data" type="EDAMdata" maxOccurs="unbounded">
in dataType
and its xs:restriction
s) ?
For now I'm inclined to leave it as-is, i.e.:
but revisit once the existing annotations are improved, and such deeper annotation is desirable. Bear in there's a big ongoing clean-up of existing EDAM topic and operation annotations (https://biotools.sifterapp.com/issues/156) and until that's finished, data and format (whilst super important) are a secondary concern ... for now! cc @hansioan
i.e. the classic example where a tool processes a sequence but this can be specified as a raw sequence or by an identifier.
seems to me the natural way to model this is to allow 1...many Data operations for an Input or Output; however very clear guidelines would be needed, i.e. we want "many" Data operations to imply that this input can be specified in more than one way, and not that this input can be considered as two types of data.
this issue is just intended to get a discussion going .... cc @matuskalas