ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

Global Change Master Directory mappings #159

Open cmungall opened 4 years ago

cmungall commented 4 years ago

See also #23

Are there plans for a mapping to GCMD? https://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords

cc @alisonboyer

lewismc commented 4 years ago

This is personally something I would like to see. So far I've been largely unable to really motivate anyone over on the GCMD side to take up reigns. That being said, I let that deter me too easily.

I would be willing to assist.

graybeal commented 4 years ago

In the past, I would not consider doing GCMD mappings because there was no rigor in the terms, and no definitions, so it was very hard (or maybe idiosyncratic) to know what they mean. (Also, for a long time I couldn't get permission to put them in the repository.)

I haven't followed their work for the last 5 years or more though. so it could be a lot better now. And a lot of people use them I am sure. So it would be great to (a) have them in the repository, and (b) have a mapping available. (I wonder if syntactic mappings might be particularly effective in this case?)

lewismc commented 4 years ago

Data versions of the GCMD instruments, platforms and science keywords do exist in COR... they are also served as linked data if you navigate to the base IRI. From what I understand these were added by @tbs1979 some time ago.

Clearly no automated process has been established for linking the above resources (or the GCMD modules more generally) to any other resource such as SWEET.

brandonnodnarb commented 4 years ago

I had been thinking about previously and then was sidetracked by other work (well, the things I actually get paid to do). A few months ago I scraped the GCMD rdf files, as well as the NASA Thesaurus, from their respective sites so I might investigate locally.

Thus far, I have stripped the URI and prefLabels (only) for each so I could do a string match/dictionary comparison to get an idea of scale. I realise this is a very low tech and hacky way to address this type of problem, and there are likely better approaches. However, as neither GCMD or SWEET have any real definitions (yet) a syntactic matching may in fact be a valid first cut (as @graybeal already mentioned).

Unfortunately, GCMD has a lot of forward slash labels. As an example, one concept URI has the skos:prefLabel "NASA/GSFC/SED/ESD/LANDSAT/ED". There are many...idiosyncrasies...like this which I have not accounted for yet.

I had a repo with all the RDF files, scripts and CSV files together, but it only just occurred to me that NASA probably don't want all their RDF files sitting in an open repo on github. :) As such, I have removed all of that and put the two previously mentioned CSV files (tab delimited) in an open repo: https://github.com/brandonnodnarb/SWEET-mappings-staging.

If any of you find these useful, have at it.

@graybeal, I had to chuckle when you mentioned GCMD not having definitions...said the pot to the kettle :)

dr-shorthair commented 4 years ago

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

rduerr commented 4 years ago

Generally yes... many of the cryosphere terms end up in two places - typically cryo and ocean for sea ice related terms.

Sent from my iPhone

On Sep 28, 2019, at 8:58 PM, Simon Cox notifications@github.com wrote:

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

graybeal commented 4 years ago

that was my thought also…I think there were a very few cases that I would call a 'mistake', in that the same term was used in two different places where it really meant something different. (So, clear syntactic collision that was overlooked, or maybe acceptable since they were in different ontologies.) Those may have all been fixed up by now.

dr-shorthair commented 4 years ago

But the same Short-name+Definition in different places has a different Number, UUID and Path: eg.

Number Short Name UUID Definition Path
1806 ABLATION ad793d5e-b75d-4d3e-a542-ad4b4075b141 The process of removal of material from the surface of an object by vaporization, chipping, or other erosive processes. The term occurs in spaceflight associated with atmospheric reentry, in glaciology, medicine, and passive fire protection. EARTH SCIENCE|LAND SURFACE|GEOMORPHIC LANDFORMS/PROCESSES|GLACIAL PROCESSES|ABLATION
2485 ABLATION  99db4dca-4d07-48fd-8ba3-393532d04aa6 The process of removal of material from the surface of an object by vaporization, chipping, or other erosive processes. The term occurs in spaceflight associated with atmospheric reentry, in glaciology, medicine, and passive fire protection. EARTH SCIENCE|SOLID EARTH|GEOMORPHIC LANDFORMS/PROCESSES|GLACIAL PROCESSES|ABLATION
1382 ABLATION ZONES/ACCUMULATION ZONES 95fbaefd-1afe-4887-a1ba-fc338a8109bb Pertaining to the reduction of a glacier due to melting and/or evaporation. EARTH SCIENCE|CRYOSPHERE|GLACIERS/ICE SHEETS|ABLATION ZONES/ACCUMULATION ZONES
2859 ABLATION ZONES/ACCUMULATION ZONES a994a6f6-cfcd-45d2-95a4-0f8455a9454d Pertaining to the reduction of a glacier due to melting and/or evaporation. EARTH SCIENCE|TERRESTRIAL HYDROSPHERE|GLACIERS/ICE SHEETS|ABLATION ZONES/ACCUMULATION ZONES
tbs1979 commented 4 years ago

Hi All,

The logic on why some of the GCMD science keywords appear in multiple places within the hierarchy (example 'Sea Ice' under 'Cryosphere' and 'Oceans') is that when users would use the keyword facets in a search interface, they would still find the keyword depending on what disciple path they were going down. In past user search behaviors, Cryospheric scientists might go look for Sea Ice under Cyrosphere and oceanographers might look for Sea Ice under Oceans. We did not want users to "miss" the keyword when doing facet type searching.

I wonder if this logic is becoming obsolete now with the more advanced ontologies and search capabilities, however the GCMD keywords are considered a controlled vocabulary and not necessarily a full ontology.

Your feedback is greatly appreciated.

Thanks,

Tyler Stevens KBR | Senior Discipline Engineer, NASA EED-2

Office: 301-851-8113 | Tyler.B.Stevens@nasa.gov

https://mail02.ndc.nasa.gov/owa/redir.aspx?C=V8CgRYg3bQhQJTfp1SZy8qz17zr57afdr3RPnU2Q1JJ5D_SfSAfVCA..&URL=http%3a%2f%2fgcmd.nasa.gov%2f


From: Simon Cox notifications@github.com Sent: Saturday, September 28, 2019 10:58 PM To: ESIPFed/sweet sweet@noreply.github.com Cc: Stevens, Tyler B. (GSFC-423.0)[Stinger Ghaffarian Technologies] tyler.b.stevens@nasa.gov; Mention mention@noreply.github.com Subject: [EXTERNAL] Re: [ESIPFed/sweet] Global Change Master Directory mappings (#159)

One key question that arises when trying to make sense of GCMD is that quite a few terms appear in more than one place in the tree. Are they the same concept in different contexts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ESIPFed_sweet_issues_159-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAGF2WORUBUTUK7HPQLM5DJLQMAKWPA5CNFSM4I277LRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GZGY-23issuecomment-2D536243355&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=ClhE-fOXVws9KIK2m9XESFX-807X65oCtO3rphfxx2E&m=kCbklLki43gl7srbOlPUFj6KQ4h2WEtbxkn5ZfskiFA&s=Y2fOylRkGoxHHEhshVaiOa6BswN8L4eAAfoHR53E6aw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGF2WOSWT2VCCKSZ7B7SAL3QMAKWPANCNFSM4I277LRA&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=ClhE-fOXVws9KIK2m9XESFX-807X65oCtO3rphfxx2E&m=kCbklLki43gl7srbOlPUFj6KQ4h2WEtbxkn5ZfskiFA&s=pZeyVjcecFNvkcaqeWJ85Rtjfjly1F4B6WWn828grUo&e=.

graybeal commented 4 years ago

I removed my last two comments (hid the first, deleted the second) as I had a brain fail and started talking about SWEET. So sorry!

brandonnodnarb commented 4 years ago

The attached image shows all concepts, and their definitions, with the label "STRATIGRAPHIC SEQUENCE". GCMD_StratigraphicSequence

Fore reference, SWEET currently has 'stratigraphic sequence' defined as a class, a subclass of 'history', without a natural language definition, in phenGeol.ttl.

One option would be to create a skos:related link between the SWEET concept and the GCMD concepts in a separate mapping file. There could also be a custom subproperty of skos:related defined removing the symmetricalness of the original relation --- i.e. SWEET:a skos:related GCMD:a would not entail GCMD:a skos:related SWEET:a as it would with skos:related.

I suppsoe there could also be an rdfs:isDefinedBy relation from the SWEET class to one of the cited definitions, with a skos:related link between the others, but that doesn't seem wise at present.

Thanks for chiming in, @tbs1979. That's good info. You are correct, there doesn't need to be unique instances (unique IDs) of the same concept in order to participate in different hierarchies, facets, or whatever. That's sort of the point, if it's the same thing (by definition) the things should be...the same. :) (this last sentence was supposed to be sarcasm, I hope it comes through that way)

Is there any capacity or will/want at NASA to modify or perhaps re-work GCMD?

lewismc commented 4 years ago

Yes thanks for chiming in @tbs1979 We captured information relating to the above in CMRQ-2485. At the time this was feedback from the ESIP 2017 conference. In June '19 this ticket and child tickets were subsequently marked as Deferred so I just assumed that no work was being done here.

@tbs1979 would you be open to having a dedicated GCMD session/session track at the upcoming ESIP Winter meeting to address the issues highlighted above?

tbs1979 commented 4 years ago

@lewismc In regards to CMRQ-2485, this work has not been set as a high priority by ESDIS, so it may not get worked on for a while. I will relay your interest in this back to ESDIS.

In regards to a GCMD session on keywords, we have had sessions in the past regarding the topics, so I don't know if there is anything new to add until some of the enhancements are made on our end. Let me see what our plans are for the Winter ESIP meeting and interest in mappings to SWEET. I think there would be some benefit there, but need to look at the LOE.

lewismc commented 4 years ago

Thanks @tbs1979

this work has not been set as a high priority by ESDIS

Understood. I think this is because we've not properly communicated that this is important for some ongoing initiatives. If we were to communicate it then it may be escalated.

we have had sessions in the past regarding the topics

Yes and I've attended a few of them. I think this issue concerns a different part of GCMD though. This is a focused effort which aims to achieve something very specific...

tbs1979 commented 4 years ago

@lewismc and all. Perhaps we can discuss some of your ideas and issues about the GCMD keywords at an upcoming telecon before we bring it to the broader ESIP community? When is your next committee telecon.

lewismc commented 4 years ago

@tbs1979 the next committee meeting is 4th Tuesday in October - 2019-10-22

SemTech Monthly Telecon

    4th Tuesday of each month at 4pm Eastern
    GoToMeeting: https://www.gotomeeting.com/join/976796333
    Phone Access: United States: +1 (872) 240-3212
    Access Code: 976-796-333 
tbs1979 commented 4 years ago

@lewismc Do you want to a short discussion of the GCMD keywords on the agenda for that meeting?

lewismc commented 4 years ago

@tbs1979 I'm not quite understanding... Do you want to chat before hand? Or do you want to dedicate time at the meeting? Please clarify.

Basically on our end (the GCMD contributor and consumer community) we have been, for some time, providing guidance to you guys (the GCMD developers and maintainers) on how the service would better meet the needs of the community. I think this information flow has at times been ad-hoc and has therefore lost its focus and emphasis. I feel that by you (and the GCMD decision makers) both attending the ESIP SemTech meeting and receiving feedback from the community, you would be in a better position to 1) focus and document the feedback, and 2) get a better idea of what you should perhaps prioritize moving forward.

Really, what we are trying to do is align SWEET with GCMD. Right now, due to how GCMD is structured, it is more difficult than it needs to be. I hope this clarifies.

tbs1979 commented 4 years ago

@lewismc A few of us can attend the next SemTech Monthly Telecon to get the discussion started if that is ok with you. We can discuss some of the activities and a path forward. Can you send me an invite to the telecon? Thanks.

lewismc commented 4 years ago

Done @tbs1979

cmungall commented 4 years ago

Was there any update on this?

An update from me. As part of the NMDC project we are mapping GCMD to ENVO and other OBOs:

https://github.com/microbiomedata/nmdc-metadata/issues/59

As a start we have made a repo for the mapping pipeline here:

https://github.com/EnvironmentOntology/obo-to-gcmd-mapping

our focus is obo to gcmd, but we'll also align sweet, then we can combine with sweet-obo mappings for mutual consistency...