Open simroux opened 4 years ago
Thanks! We also should use this as an opportunity to spot dangerous semantic variation of the same label across packages.
@simroux when we create the ontology terms to represent these, I suppose they will be sediment specific right? So "depth" is actually "depth in the sediment"? The table above suggests this is "depth in soil" which isn't the same thing (i.e. wouldn't be the same ontology class).
This also raises an issue about domain conflict that @ramonawalls can perhaps help resolve with BCO:
ENVO can deal with properties of environmental entities. "Depth of sediment" is fine as are things like "nitrate concentration in sediment". However, what seems to be needed in the "depth" case (ignoring the soil/sediment conflict) is "depth of sampling event in a sediment column" which is more a BCO field. @ramonawalls thoughts?
@pbuttigieg : These are two good questions. For question #1: I would intuitively say sediments are a subset of soils, however this is not what is currently reflected in ENVO, where soil and sediment are two distinct classes at the same rank (under "Environmental material"), and there's certainly some very good reasons + experts opinion for this. So then yes, while I initially tried to re-use terms associated with soil (e.g. "depth in soil"), these should be adapted to sediment (i.e. "depth in the sediment").
For BCO vs ENVO, I agree that the "depth" will most often be associated with a sample or measurement (I could imagine some in situ probe measurements that would need to be associated with a "depth in sediment" information). So we may need both BCO and ENVO ??
Building on the above - many of these seem to be package agnostic:
"concentration of magnesium" inheres in any sampled material, not just sediment.
Two routes present themselves:
We use BCO with its notion of "sample" to compose things like "concentration of magnesium in a sample of environmental material" axiomatised similar to
'concentration of'
and ('inheres in' some
(magnesium
and ('part of' some (BCO:sample and 'composed primarily of' some 'environmental material'))))
Where the ENVO medium/material field in the MIxS core checklist will be parsed to specify the environmental material.
For each package, we pre-compose specific IRIs for each property described by each parameter. When we RDFise MIxS, this will mean that each field in each package will have its own IRI.
Thoughts @ramonawalls @cmungall ?
So then yes, while I initially tried to re-use terms associated with soil (e.g. "depth in soil"), these should be adapted to sediment (i.e. "depth in the sediment").
Thanks for the clarification @simroux - this is what I meant above by some of the 'dangerous semantic variation' in MIxS. This wasn't a big issue in the past, but now when we're trying to get more organised and precise, we'll need to clean this up.
For BCO vs ENVO, I agree that the "depth" will most often be associated with a sample or measurement (I could imagine some in situ probe measurements that would need to be associated with a "depth in sediment" information). So we may need both BCO and ENVO ??
It's likely we'll need a combination of ontologies to handle the semantics implicit in several MIxS terms, but don't worry too much - this is normal and healthy in the OBO world as we don't want to try to build one ontology that covers everything. OBO ontologies interoperate and can be spliced together as needed.
@cmungall @ramonawalls An OBO application ontology for MIxS may not be a bad idea here, so we get the IDs and RDF we need to take MIxS forward in an interoperable way.
Cleaning up this type of issues would be great :-) From my outsider perspective, option 1 seems the best as, once this would be in place, you could expand it to new compounds / measurements or new environments relatively "easily", but I've no idea how more complex this would be in terms of implementation.
From my outsider perspective, option 1 seems the best as, once this would be in place, you could expand it to new compounds / measurements or new environments relatively "easily", but I've no idea how more complex this would be in terms of implementation.
I'm leaning that way too - it's a good segue into more efficient and structured use of reference ontologies that saves the need for massive inflation.
A potential issue with my draft approach for option 1 is that there can be multiple material terms in the 3rd slot.
I don't think we need to bring in BCO classes here
Hmm. Not all soils are sediments in the classical sense. Organic soils in particular. 'Sediment' has a genetic feel about it (related to sedimentation?) while soils are formed through a variety of processes, some involving transport, but others not.
Somewhat related is this long-outstanding NTR: https://github.com/EnvironmentOntology/envo/issues/825 the narrative for which now includes the textual definitions of the Australian soil orders.
Based on work done in the NMDC Ontology workshop. A substantial part of these terms would also apply to the MIxS soil checklist which @ukaraoz is curating. To order the different terms, we looked at which fields were most often filled in in submission to ENA (thanks to data provided by @josieburgin) for the soil and sediment checklist. The list below includes is ordered from the most frequently to least frequently used terms.