Open turbomam opened 2 years ago
Note that we are not treating |
as a literal delimiter. It is the or operator, within enumerations, and between Value syntax
components
I tried to find an issue for this, and this is the closest one I could find.
The inconsistent use of these "literal delimiters" is confusing. I propose
,
= separation of items that are part of a list & related
agrochem_addition
= roundup, 5 milligram per liter, 2018-06-21:
= in ontology fields only
HACCP_term
= tetrodotoxic poisoning [FOODON:03530249];
= separating parts of a multivalued slot
agrochem_addition
= roundup, 5 milligram per liter, 2018-06-21; roundup, 5 milligram per liter, 2019-10-13HACCP_term
= tetrodotoxic poisoning[FOODON:03530249]; neurotoxic shellfish poisoning[FOODON:03530246]I see no use for /
and -
and x
and |
... i know |
is often used in place of ;
but they seem the same to me. & the use of |
vs ;
seems to have been at the discretion of the creator.
@turbomam curious on your thoughts.
I think this analysis is a great step forward. For me, the next step is some valid and invalid data files that illustrate your positions.
I hope you, I and everybody else is clear on the current implementation: the LinkML language doesn't have any concept of a "value syntax". Same for "expected value". I made a good faith mapping of those columns in the MIxS 6.0 Google sheet to LinkML range
and pattern
constraints on the corresponding slots.
I think your comment above is addressing the fact that I totally punted on slots like agrochem_addition
, which have pseudo-patterns for their flattened, pre-composed values.
In the agrochem_addition
example, I disagree that ,
is being used for elements of a list, and I think it may be hard to use the word "related" in a technical specification.
roundup, 5 milligram per liter, 2018-06-21
is pre-compsed sequence of things that would be captured in sub-slots in NMDC, like agrochem_addition.agent
, agrochem_addition.dose
and agrochem_addition.applciation_date
.
I still like your ideas for bringing clarity to this, and hopefully we can show examples of successful and unsuccessful validation.
As far as LinkML is concerned, |
is the only acceptable character for delimiting multiple values in a multi-valued slot. In fact, in order for LinkML to parse the HACCP_term
you provided out of a CSV or TSV, they would have to be rendered like this:
[tetrodotoxic poisoning[FOODON:03530249]|neurotoxic shellfish poisoning[FOODON:03530246]]
The outer square brackets are currently required. I'm not sure how the inner square brackets will be handled. I'll take responsibility for working though those examples, but hopefully @cmungall will have some thoughts to share.
But all of this would contradict your stated preference of ;
for concatenating multiple values.
In the
agrochem_addition
example, I disagree that,
is being used for elements of a list, and I think it may be hard to use the word "related" in a technical specification.roundup, 5 milligram per liter, 2018-06-21
is pre-compsed sequence of things that would be captured in sub-slots in NMDC, like
agrochem_addition.agent
,agrochem_addition.dose
andagrochem_addition.applciation_date
.I still like your ideas for bringing clarity to this, and hopefully we can show examples of successful and unsuccessful validation.
@turbomam
Let's separate the NMDC from GSC here. NMDC cares about the different pieces of agrochem_addition
because we have a database. As a standard, it's up to the institutes that implement this slot to determine how it's stored. GSC doesn't have .agent
, .agent
, or .application_date
.
As such, I'm not sure what you're trying to get at with this for GSC. For NMDC, yes, absolutely. But, nothing GSC would do?
As far as LinkML is concerned,
|
is the only acceptable character for delimiting multiple values in a multi-valued slot. In fact, in order for LinkML to parse theHACCP_term
you provided out of a CSV or TSV, they would have to be rendered like this:[tetrodotoxic poisoning[FOODON:03530249]|neurotoxic shellfish poisoning[FOODON:03530246]]
The outer square brackets are currently required. I'm not sure how the inner square brackets will be handled. I'll take responsibility for working though those examples, but hopefully @cmungall will have some thoughts to share.
But all of this would contradict your stated preference of
;
for concatenating multiple values.
Ah! Ok, well then no ;
and only use |
So...
,
= separation of items that are part of a list & related
agrochem_addition
= roundup, 5 milligram per liter, 2018-06-21:
= in ontology fields only
HACCP_term
= tetrodotoxic poisoning [FOODON:03530249]|
= separating parts of a multivalued slot
agrochem_addition
= roundup, 5 milligram per liter, 2018-06-21 | roundup, 5 milligram per liter, 2019-10-13HACCP_term
= tetrodotoxic poisoning[FOODON:03530249] | neurotoxic shellfish poisoning[FOODON:03530246]Doesn't that example for agrochem_addition
above use a semicolon where we agreed to use a pipe?
I agree, my mention of hypothetical NMDC sub-slots like agrochem_addition.agent
, agrochem_addition.dose
and agrochem_addition.applciation_date
isn't directly actionable by MIxS. But getting into this discipline is the best hope MIxS has for terms like agrochem_addition
becoming machine-actionable.
At this point in time they are inconsistent and unenforceable.
I am really concerned by the apparent reality that none of the people we routinely interact with know how (or are willing to) make a valid, minimal table of samples (MimsSoil perhaps) that comply with the standard. Once a couple of people contribute in that way, we can incrementally resolve issues like the syntax and legal punctuation in agrochem_addition
etc.
Doesn't that example for
agrochem_addition
above use a semicolon where we agreed to use a pipe?
forgot to update that one. fixed
Several literal delimiters apper in the various
Value syntax
es,
-
/
:
;
x
When validating, should we allow an arbitrary number of whitespaces around delimiters?
If not, then the
Value syntax
es will be interpreted literally, with respect to padding, the way term submitters entered the term