GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 21 forks source link

New term proposal : isotopolog #577

Open mslarae13 opened 1 year ago

mslarae13 commented 1 year ago

New term details For us to assess a new term request we require the following details:

Term name - isotopolog
Structured comment name - isotopolog
Definition - Isotopologue (isotope source/substrate/molecule) added to the biological sample. List the PubChem Compound Identification (CID) number, or, if an undefined mixture, a short description. If more than one isotopologue was used in this sample, use a pipe to delimit each isotopologue, and ensure all isotopologue- describing fields describe all isotopologues in this order.
Expected value - text or CID
Value syntax - {termLabel} {[termID]}|{text}
Example - toluene [pubchem.compound:1140] or toluene [pubchem.compound:1140] | water [pubchem.compound:962] or root exudates
Preferred unit - NA
Package(s) - new checklist term applicable to all packages / extensions

Additional context Add any other context about the new term here.

simpso91 commented 1 year ago

Term name - isotopologue Structured comment name - isotopologue Definition - Isotopologue (isotope source/substrate/molecule) added to the biological sample. List the PubChem Compound Identification (CID) number, or, if an undefined mixture, a short description. If more than one isotopologue was used in this sample, use a pipe to delimit each isotopologue, and ensure all isotopologue-describing fields describe all isotopologues in this order. Expected value - text or CID Value syntax - {termLabel} {[termID]}|{text} Example - 1140 or cellulose or 1140 | cellulose Preferred unit - NA

mslarae13 commented 1 year ago

@simpso91 is 1140 a different compound? Or is it the CID for cellulose?

simpso91 commented 1 year ago

Yes, 1140 is a different compound (toluene)

We could use 962 (water) instead of cellulose so it would look like: (three different examples) 1140 or 1140 | 962 or root exudates

turbomam commented 1 year ago

@only1chunts , @ramonawalls and I are working hard to get the various components of a term definition to be fully compatible with one another. Can we work through that for this term, and possibly other terms in your new package?

The Value syntax is '{termLabel} {[termID]}|{text}' but the examples are

All of those are valid as text, but free text is obviously the worst choice if you want the data you and others submit to be FAIR

Pubchem CIDs are also mentioned, presumably as a more controlled namespace. That's great! Is that what '1140 | cellulose' is supposed to be an example of? A {termLabel} {[termID]} example could be 'cellulose [pubchem.compound:1140]' (as redirected by identifiers.org and bioregistry)

turbomam commented 1 year ago

I really think we should update the issue tempalte for new term requests to clarify the interrelatedness of the various term attributes.

simpso91 commented 1 year ago

Thank you @turbomam ! In that case, the examples should be:

Also, @mslarae13 in the manuscript (because I had to shorten all the structured comment names anyway), we are moving towards using the American spelling "isotopolog" instead of "isotopologue". Could this entry be changed to reflect this?

only1chunts commented 1 year ago

@turbomam - I have added a line to the issue tickets templates requesting addition of relationship to other terms. I had hoped that sort of detail would be included under the "additional context" section, but I guess its better to be explicit.

simpso91 commented 1 year ago

In the interest of simplifying this field, Roli proposed having this field accept only numerical values:

mslarae13 commented 1 year ago

@simpso91 what is the additional column? We can't add columns depending on another column. The same reason we can't has isotopolog_1, isotopolog_2 ... etc depending on how many there are. You have to pipe them.

So we would have to make this column the numerical value... and have another column for the name. Making the name column optional.

The misspelling when a pubchem ID doesn't exist is a very valid concern.

mslarae13 commented 1 year ago

Discussed If not in pubchecm it's hard to bioinformatically worked with. Add a link for how to go to pubmed and fill out your ID