ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

Earth Science Idol - Using the YAMZ metadictionary as a common knowledge harvesting target for SWEET/OBO (esp. ENVO) #33

Closed bethhuffer closed 4 years ago

bethhuffer commented 7 years ago

Question from Mike Huhn re: your pilot project aligning SWEET and ENVO: I see several possibilities and would like to understand which you intend. The possibilities are

  1. Augment ENVO with terms from SWEET (eventually SWEET might not be needed, because ENVO would be a superset of it)
  2. Augment SWEET with terms from ENVO (eventually ENVO might not be needed)
  3. Augment both SWEET and ENVO with terms from each other (eventually SWEET and ENVO would be identical)
  4. Identify the intersection of SWEET and ENVO, which in some sense would be a core ontology for the environmental domain. It might even be maintained separately, with SWEET and ENVO being unique extensions of it
  5. Create mappings between terms in SWEET and terms in ENVO (this is what Line Pouchard’s student did ~2 years ago), so that the union of SWEET and ENVO could be used, while their separate development could continue.

    So, please clarify your approach and let me know how I can help.

    One more thing, I have a concern about using yamz, because it appears to operate on one pair of terms at a time. As you know, the power of an ontology is not in the terms and their attached English definition, but is in the relationships among the terms. For example, the term “soil” might be a verb or a noun, and this is determined by whether it is a subclass of the term “Process” or a subclass of the term “Physical Object.”

bethhuffer commented 7 years ago

I believe the intended domain of SWEET is larger than the intended domain of ENVO, so I think we can rule out 1 - 3. Although we are also wanting to align SWEET with CheBi, so maybe 1 - 3 are viable but with other ontologies besides ENVO in the mix. I suspect some flavor of 4 is what we'll work toward. But maybe we'll discover for some reason that 5 is a better approach. I think that is something that our little pilot project will want to explore. As for where to start, I was thinking that identifying the current intersection of SWEET and ENVO would be a good place to start.

For this project, we are using yams as a convenient, user-friendly tool for crowd-sourcing terms and definitions. So if the term 'soil' is in the intersection of SWEET and ENVO, and there are conflicting definitions, we can enter the term in yamz and let the subject matter experts weigh in on what the right definition is. yamz will assign a unique ID to the term, and participating community members will try to come to a consensus on its definition. For the most part, if we can't come to a consensus, we will want to consider the possibility that we are dealing with two different, homonymous terms. Yamz won't stop us from having soil (noun) and soil (verb). Each of those would get a unique ID and, based on the definitions, we would decide what (if anything) in the SWEET and ENVO ontologies the term(s) refer(s) to. If there is nothing in the ontologies, and we think there should be something, then based on the approved definition, we would make it an item in the ontology - i.e., introduce an ontological entity to the ontology, which is the referent of the term "soil". Yamz is really playing the role here of a very user-friendly platform that is accessible to subject matter experts who may not want to deal with the ontology, but may want to weigh in on the definition of a term. It's not meant to replace the ontology. One of the things we can use yamz for is commenting on where we think the referent of a newly introduced term - the ontological entity - should go in an existing ontology. In other words, consideration of its ontological characteristics might be part of the process of defining it, and we can use yamz to have that discussion.

I think to get started, we should try to identify some part of SWEET that we want to start with, where we are likely to find overlap with ENVO. If anybody has thoughts about what the priorities might be, please speak up! From there, we will probably need to find the ontological entities that we think are in fact in the intersection of the two ontologies. Here, we may need to be careful. Something like radiative flux is also sometimes referred to as 'irradiance' and so we will not only have to look for matching terms, but we will have to look for (possibly) identical ontological entities, even when they are referred to using different terms in the two ontologies. (Of course, the process of defining terms can help us here. If 'irradiance' and 'radiative flux' are entered as separate terms in yamz, their eventual placement within the ontology(ies) might reveal to us that we have synonyms on our hands.)

MichaelHuhns commented 7 years ago

Thanks for the explanation! Now I see two daunting problems:

  1. If you try to use yamz to reconcile "Soil" in ENVO with "Soil" in SWEET, the crowd will find that the one in SWEET has no definition and the one in ENVO has a very detailed definition. Should ENVO's definition be copied to SWEET? No. This would be a bad idea, because "Soil" is a different concept in the two ontologies. The crowd would have to look beyond the terms to discover that "Soil" is a subclass of "Sediment" in SWEET, but "Soil" and "Sediment" are mutual subclasses of "Environmental Material" in ENVO. Furthermore, "Soil" and "Sand" are parallel concepts in SWEET, but "Soil" hasPart "Sand" in ENVO. Seriously, reconciling concepts between ontologies is not solvable by people "weighing in on the definition of a term."
  2. I have low expectations about the "wisdom of a crowd." The crowd thinks that a dolphin is a fish. How do you propose to avoid and fix such misconceptions?

In my experience, the best ontologies have emerged out of a working partnership between someone who understands a domain and someone who understands the domain-independent parts of an ontology metamodel (subclasses, parts, instances, causes, occurs, etc.).

bethhuffer commented 7 years ago

Well, if the two concepts really are different, then we shouldn't try to reconcile them. We should instead consider renaming the SWEET concept "sedimentary soil" or something, to make it clear that it is not intended to refer to all soil, but only that which is also sediment. Alternatively this exercise will help us recognize that soil is misclassified in SWEET, and should be somewhere else in the ontology. The ontologists among us will then go to work figuring out where.

I apologize for using the term "crowd-sourcing". The intent is to have subject matter experts help us in defining terms correctly, or in helping us realize that we need a different term. We'll probably never actually have a whole crowd of them, and if we do, we will give much less weight to the opinions of people who think dolphins are fish than we do the opinions of marine biologists when it comes to defining "dolphin". :-) The YAMZ tool is a convenient platform for presenting terms to people who may not want to deal with a full-fledged ontology. Definitions can include phrases like "that has settled to the bottom of a liquid" to indicate that the concept in question is a subclass of sediment. Ontologists are, in fact, involved in this effort, and they can help ensure that, once we've agreed on what a term means, it is properly situated in the ontology. This is a pilot to try to sort out the way forward in making SWEET a valuable COMMUNITY resource, and in ensuring that it is aligned with ENVO. If it turns out that the approach we outlined while drinking beers at a pub in Bloomington is inadequate, we can always refine it. :-)

lewismc commented 7 years ago

Hi @bethhuffer @MichaelHuhns I think 5 above is the intent solution. @pbuttigieg wdyt?

lewismc commented 7 years ago

@bethhuffer I would like to, if possible, begin poking our roster of subject matter experts to begin providing annotations of SWEET terms as suggested in #20 . Can this be achieved via Earth Science Idol?

bethhuffer commented 7 years ago

@lewismc @MichaelHuhns @pbuttigieg I think that's exactly what we should be doing as part of Earth Science Idol. Do we have any particular SWEET module that is a priority? I think our plan was to try to identify a sub-section of SWEET that overlaps with ENVO so we could work on the alignment process as well. But I suppose the availability of SMEs is also a reasonable criterion for deciding where to start.

MichaelHuhns commented 7 years ago

@bethhuffer @lewismc @pbuttigieg @linepouchard Improving SWEET seems to have two parts: (1) annotating existing concepts as suggested in Issue #20 and (2) fixing improperly classified concepts (e.g., Beth and I noticed that the concept "Soil" is different in ENVO and SWEET). Both of these parts require a SME. Domain independent (aka, domain ignorant, lol) people like me can help by using ontology alignment tools to locate the concepts that are potentially misclassified.

graybeal commented 7 years ago

All, I think there might be a useful activity here to take a look at Scott Peckham's work, I think called Geospatial Standard Names at geoscienceontology.org (haven't looked recently though). I'm not sure exactly how it could be applied but worth considering.

hsu000001 commented 7 years ago

@graybeal, just chiming in here, Geoscience Standard Names at the link you shared, derived from/previously known as CSDMS Standard Names http://csdms.colorado.edu/wiki/CSDMS_Standard_Names , and there should be several EarthCube projects that got in touch with domain communities/users. I'm sure Scott would have more details. We also looked at them as part of the SEN EarthCube project, and the CSDMS community has it implemented in some of their stuff as I recall.

rduerr commented 7 years ago

@graybeal Scott's geoscienceontology.org stated intent is "The Geoscience Standard Names Ontology is a schema for describing computational models (and data sets) in a standardized way. It uses Semantic Web technologies and best practices (e.g. RDF, OWL, SKOS) to formalize the concepts needed to provide a deep description of a resource. "

But let's face it data and models aren't the only things in the world; both SWEET and ENVO encompass a lot more (albeit perhaps at a higher level?).

So the question for me is where and how to align these things. For example, the geoscienceontology that can be downloaded mentions the string "glacier" 554 times! However, if you look at those mentions they tend to be full variable definitions like:

default1:glacier_bottom_ice_flow__y_z_component_of_stress a owl:NamedIndividual , geo-upper:Variable ; rdfs:label "glacier_bottom_ice_flow__y_z_component_of_stress"@en ; geo-upper:hasObject http://www.geoscienceontology.org/geo-lower/object#glacier_bottom_ice_flow ; geo-upper:hasQuantity default12:y_z_component_of_stress ; geo-upper:hasRootObject http://www.geoscienceontology.org/geo-lower/object#ice_flow .

I would note that these terms don't have english definitions; but then the variable names are so specific one could claim that none are required.

But even with this much specificity there are things left undefined. For example, now which directions are y and z and what are they measured with respect to (e.g., North, South, direction of flow)? This may be well defined in the modeling community; but that doesn't mean the rest of the world understands.

Also if you look at the term http://www.geoscienceontology.org/geo-lower/object#glacier which is mentioned in several of these variable definitions, this is it's definition:

http://www.geoscienceontology.org/geo-lower/object#glacier a geo-upper:BasicObject , owl:NamedIndividual , geo-upper:SimpleObject , geo-upper:Object ; rdfs:label "glacier"@en .

Now this should have a human-readable definition as from this definition most people would not have a clue what a glacier is! Yes, there are other terms that relate back to this one, but that doesn't help the person who wants to understand what a glacier is, what its attributes are!

So I personally think we should just start with SWEET and ENVO and leave geoscienceontologies for the next stage...

I will look at the cryo terms in SWEET and ENVO in my SME role though (if that's a real term). I can tie both to the YAMZ definitions for those terms.... In many cases, there aren't real disagreements about term definitions (though that isn't always true).

bethhuffer commented 7 years ago

Yes, thanks Ruth. I'm inclined to agree with that. In fact, other ontologies such as geoscienceontology.org may want to consider referencing terms like "glacier" in their own ontology to the "canonical" definitions in SWEET and/or ENVO (assuming that what geoscienceontology.org means by glacier is the same as what SWEET and/or ENVO means.

linepouchard commented 7 years ago

I second Ruth's comment @rduerr @graybeal @hsu000001 that we should start with SWEET and ENVO before introducing additional terms. One good place to start would be to run the ontology alignment again on the new version of ENVO.

rduerr commented 7 years ago

Yes, that is exactly what I was trying to do when I created those early sea ice ontologies - relate those more detailed terms back to SWEET terms. I think the current work will allow me to both complete that task and possibly retire my ontologies in favor of ENVO/SWEET terms (at least in many cases) @bethhuffer

bethhuffer commented 7 years ago

I have a use case that is similar to the geoscienceontology.org. My ontology is meant to mark up variables within Earth science data products with similar info to that which is part of the Semantic Sensor Network Ontology. I.e., what's being measured? What quantity of that thing is being measured? Where is it being measured? etc. It would be nice if these types of ontologies, which may be focused on describing data/measurements but nonetheless need to make use of scientific vocabulary that is best left to subject matter experts, could take advantage of efforts such as SWEET and ENVO, and the expertise that is behind them.

linepouchard commented 7 years ago

@rduerr @MichaelHuhns Here are the alignments between SWEET and the Seaice ontologies SWEETtoSeaiceAndBack.zip dated 2014

rduerr commented 7 years ago

@linepouchard Thanks!

graybeal commented 7 years ago

Yes, Ruth has done a convincing review, thanks Ruth. I was thinking that at a minimum, it could be a good source of definitions and of terms in particular contexts. Obviously the constructed terms are much too rich a set to work with.

Thanks for looking it over and offering that, go forward with the plan for sure.

dr-shorthair commented 7 years ago

What is 'Earth Science Idol"? Without that information, this issue has a terrible title - it appears to be largely a discussion alignment of SWEET with ENVO, which I have an interest in, but had been ignoring the email reminders because I assumed it was something frivolous.

Use of informative subject lines/issue titles is important, particularly for those of us looking on from a little further outside. OTOH - If this issue is about something else, then I'm happy to be enlightened but it is not clear from the content of the thread.

dr-shorthair commented 7 years ago

Onto the SWEET-ENVO alignment issue -

I understood that SWEET started with a larger scope than ENVO, but during the development of ENVO, PLB has grown the scope as he realised that this was necessary for a full system description. A key difference between SWEET and ENVO is that ENVO adopted an existing framework (OBO) and its upper ontology (BFO) which come from the life-sciences. I'm not sure if ENVO can be used de-coupled from BFO etc, but if not, then a strong alignment of SWEET to ENVO might come with some costs.

There are a few styles in ontology development. The OBO foundry follows the "one big system integrated through use of a particular upper ontology". Historically SWEET has offered something similar, but with a different upper layer. More recently the 'Ontology Design Patterns' community has proposed that "lots of small, tightly-scoped ontologies, each solving one problem, which can be aligned separately if and only when you need to" might be a more pragmatic approach, and which is also more like what the W3C community has done. It means that your ontological commitment is less, and it is less painful to change your mind and switch later, one piece at a time.

dr-shorthair commented 7 years ago

And on the definitions of soils - maybe involve some pedologists? (I know some). They have a tendency towards descriptive rather than genetic classifications, since its often geotechnical, fluid transport and agricultural applications that are the top priority. But there is a lot of existing practice there.

graybeal commented 7 years ago

We definitely see those two communities of practice re development styles, and there are strengths in each. SWEET could justify taking either path, but I think a first principle of the chosen style should be availability to describe a wide range of entities in different semantic contexts, with minimum constraints introduced by the larger context of the ontology.

I'm not offering an opinion about whether the OBO approach, or the existing SWEET context, creates such constraints, I really don't know. I just want that to be considered as a cost, in that SWEET to date has been such a general-purpose artifact, and I assume we want that continue.

pbuttigieg commented 6 years ago

Greetings all, I'm parsing this thread and will respond tomorrow. In brief, the metadictionary offers a low-barrier forum to gather domain expert definitions and debate for both SWEET and ENVO to harvest, thus aligning their content by examing a shared knowledge source. How these ontologies will relate to each other is much more interesting. More tomorrow!

pbuttigieg commented 6 years ago

Regarding the initial questions:

I see several possibilities and would like to understand which you intend. The possibilities are

  1. Augment ENVO with terms from SWEET (eventually SWEET might not be needed, because ENVO would be a superset of it)
  2. Augment SWEET with terms from ENVO (eventually ENVO might not be needed)

Regarding 1 and 2 (and a bit of 3): To my understanding, SWEET is actually more like a federation of ontologies than a domain ontology like ENVO. In a sense, it's an OBO Foundry parallel for Earth science. ENVO is the main point of contact between these realms due to its subject domain and also its broad scope.

Given the above, the SWEET/ENVO overlap can act as the catalyst for harmonisation of these communities, preventing an unnatural technical and theoretical disconnect between semantics in the life sciences and planetary sciences (our recent release "Planetary Ecology" tries to approach that point)

Other ontologies that would be very relevant to the domain of human activities would be, e.g.:

  1. Augment both SWEET and ENVO with terms from each other (eventually SWEET and ENVO would be identical)

I really believe that the long term goal would be complete interoperation of the various federations of ontologies, and I'm not averse to a full merge. This would be hugely exciting for data science across domains. This won't happen overnight, but committing to this would rally many talented people towards a more production-grade infrastructure which evolves with innovative research while being stable enough to satisfy non-research/academic stakeholders.

This would probably start with strong mappings, reuse, co-development, and shared governance. There are some points of difference to iron out, but pragmatism should rule here: there are good reasons that these resources differ from one another, thus it's our task to find out how to build something that meets all (esp. practical) needs.

The metadictionary that is the target of the ESIP Idol mini-grant is an important building block here: it will rally domain experts around a common, easy-to-work-with forum allowing all semantics federations to be on the same page and direct the domain experts they work with to contribute a shared pool of knowledge. YAMZ is just one option, but I really like that it gives stable identifiers to the discussions themselves, which can be cited as sources in the ontology classes that derive from them.

  1. Identify the intersection of SWEET and ENVO, which in some sense would be a core ontology for the environmental domain. It might even be maintained separately, with SWEET and ENVO being unique extensions of it

I see the point here, but I'm slightly averse to creating and maintaining another ontology. I think the strategy outlined above will be more sustainable in the long-term.

  1. Create mappings between terms in SWEET and terms in ENVO (this is what Line Pouchard’s student did ~2 years ago), so that the union of SWEET and ENVO could be used, while their separate development could continue.

As noted above, mapping will be very important in the early phases, but I'm convinced this should pave the way to some sort of merged solution with shared governance.

One more thing, I have a concern about using yamz, because it appears to operate on one pair of terms at a time. As you know, the power of an ontology is not in the terms and their attached English definition, but is in the relationships among the terms. For example, the term “soil” might be a verb or a noun, and this is determined by whether it is a subclass of the term “Process” or a subclass of the term “Physical Object.”

I agree with this issue, but if the discourse is used as a source to be parsed and ontologised (rather than taken as is), I think the semantics will be cleaned up. As @dr-shorthair notes, we need to solicit real domain expert input any disagreements or any tricky definitions (probably most of them when you look closely enough). Again, this is why having a shared space for experts to go to would speed things along (e.g. a function to call for a "verified" pedologist could be added to YAMZ to moderate such threads).

This is a good development target: Allow the issuing of identifiers to every alternative definition in a thread, and allow each definition to be voted up or down. A filter to see only the votes of verified domain experts would help solve the dangers of crowd sourcing.

Must run, but more on the other comments later...

pbuttigieg commented 6 years ago

PS: We also need to be clear on how far on the semantic expression scale SWEET wants to go: dense axioms à la some OBO resources (e.g. one of ENVO's biofilm environment classes created for the Earth Microbiome Project) or more a highly structured thesaurus or glossary?

In the case of the former, working towards a full merge would probably save a lot of pain down the line. In the case of the latter, a tight synchronisation (beyond mapping, which is still quite error prone) would probably be enough.

PPS: SWEET has a bunch of chemical terms, I'm assuming it would import CHEBI? Doing that de novo is a scary prospect.

pbuttigieg commented 6 years ago

In response to @MichaelHuhns comment here https://github.com/ESIPFed/sweet/issues/33#issuecomment-320153021

Thanks for the explanation! Now I see two daunting problems:

Response to first daunting problem

If you try to use yamz to reconcile "Soil" in ENVO with "Soil" in SWEET, the crowd will find that the one in SWEET has no definition and the one in ENVO has a very detailed definition. Should ENVO's definition be copied to SWEET? No. This would be a bad idea, because "Soil" is a different concept in the two ontologies. The crowd would have to look beyond the terms to discover that "Soil" is a subclass of "Sediment" in SWEET, but "Soil" and "Sediment" are mutual subclasses of "Environmental Material" in ENVO. Furthermore, "Soil" and "Sand" are parallel concepts in SWEET, but "Soil" hasPart "Sand" in ENVO. Seriously, reconciling concepts between ontologies is not solvable by people "weighing in on the definition of a term."

Agreed. I think the YAMZ discourse should actually be blind to the ontologies that are watching it. The point would be to see what the domain experts say about the definitions and varying usage and then have a common corpus to ontologise. This would, en passant, aid tighter and tighter alignment of the resources.

Furthermore, "Soil" and "Sand" are parallel concepts in SWEET, but "Soil" hasPart "Sand" in ENVO.

Just a short note: the part of relation doesn't override the subclass relations between environmental material, soil, and sand in ENVO. It's just a deeper axiomatisation relative to a subclass-only resource.

Response to second daunting problem

I have low expectations about the "wisdom of a crowd." The crowd thinks that a dolphin is a fish. How do you propose to avoid and fix such misconceptions? In my experience, the best ontologies have emerged out of a working partnership between someone who understands a domain and someone who understands the domain-independent parts of an ontology metamodel (subclasses, parts, instances, causes, occurs, etc.).

Yes, ENVO's best parts have grown with expert consultation. I think I've already expressed where I stand on how to use the "crowd" (hopefully more a crowd of experts than not) knowledge: as raw material to base semantic clean up discussions around.

cmungall commented 6 years ago

highly preliminary work mapping SWEET to OBOs: https://github.com/cmungall/sweet-obo-alignment (happy to move to ESIPFed org when more mature)

lewismc commented 6 years ago

Very good @cmungall this is a significant step in the right direction. Am I right in thinking the pipeline is generic enough for us to utilize it for additional alignments e.g. for https://github.com/ESIPFed/sweet/issues/27 ?

pbuttigieg commented 6 years ago

@cmungall Could this be rerun? SWEET has updated its URIs.

dr-shorthair commented 6 years ago

For example of alignment graph held separate from the target ontologies, see https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-bco-mapping.ttl and others alongside

Note that the sosa-bco map also needs some property alignments. https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-prov-mapping.ttl and https://github.com/w3c/sdw/blob/gh-pages/ssn/rdf/sosa-oboe-mapping.ttl are more complete.

lewismc commented 6 years ago

@dr-shorthair can you link directly to an example of how sosa-bco-mapping.ttl is used/consumed in sosa? Thanks

dr-shorthair commented 6 years ago

sosa-bco-mapping.ttl is not used/consumed in sosa. This is merely a formal (axiomatic) documentation of the alignment.

It might be loaded as part of a reasoning exercise, along with both of the target vocabularies, in order to get the combined axiomatization from both SOSA/SSN and OBO, which might add value to a dataset expressed using one of the vocabularies when used in a context where the other one is usually used.

lewismc commented 6 years ago

OK thanks @dr-shorthair , I'll make sure we keep this in mind when we talk next week. Thanks

dr-shorthair commented 6 years ago

FWIW - I just took a peek at the subsumption hierarchy of soils in ENVO. AFAICT the top of the hierarchy is just fine. There are no 'disjoint' axioms with organics to cause any trouble.

However, immediately below that are a few dozen classes of soils like podzol, acrisol, etc. The axiomatization of these (binding properties to particular values and ranges) looks fine too, but this reflects just one, relatively simple, classification of soil types (mostly from USDA?).

There are other systems, which can lead to much bigger classifications (for example, the Australian Soil Classification http://www.publish.csiro.au/book/7428 - we are working on an RDF view of this ... ).

To me it looks like a mistake to do this in the ENVO namespace - it implies that the classification shown is universal, or endorsed by ENVO. ENVO should stop at 'soil'.

I will provide this feedback at ENVO as well.

pbuttigieg commented 6 years ago

Hi @dr-shorthair

ENVO's not an authority for classification and we don't intend to endorse one system over another. I'll make that more clear in the README.Md. We just offer an OBO expression of some existing classifications to and weave in individual or project based input as needed.

I don't think it will be a problem to host multiple classification systems - classes will remain disjoint if their differentia (which we can axiomatise as densely as needed) don't overlap exactly.

Let's branch off this discussion on an issue in ENVO. Perhaps this one: https://github.com/EnvironmentOntology/envo/issues/333

If you're working on the Australian case, I'd be happy figure out how to interoperate. I would like to see more examples of ENVO importing focused projects like this (retaining their namespace) and making them discoverable through its more broadly scoped content.

lewismc commented 4 years ago

YAMZ is dead. Closing this off.