ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #208

Closed lewismc closed 3 years ago

lewismc commented 4 years ago

This issue addresses all prior suggestions contained in related PR's #205 #203 #202 #201 and #200.

@smrgeoinfo it also removes all spurious definitions as you had highlighted.

@brandonnodnarb this addresses of all the issues you pointed out.

Please review and provide any feedback. Thanks

smrgeoinfo commented 4 years ago

I did not do a complete pass through all the suggested Wikidata definition mappings. I suspect there are other defs that should be rejected... I can work on that but it will take a few days.

lewismc commented 4 years ago

@smrgeoinfo ... no problem. I am in no rush to close this issue. If and when you can review, please do. Thank you

brandonnodnarb commented 4 years ago

link to spreadsheet for convenience

I'll start from the last entry and work my way up (as I have time this weekend).

wdduncan commented 4 years ago

@lewismc

Have you looked at Chris Mungall's sparql-prog tool for wikidata?: https://github.com/cmungall/sparqlprog_wikidata

It may help mine data from Wikidata.

lewismc commented 4 years ago

Hi @wdduncan

Have you looked at Chris Mungall's

Yeah I did previously. Great piece of kit! I think that post-'this issue' I'll approach @cmungall again and see if we can re-run/update some of his previous efforts in this area. Thanks for dropping in.

rrovetto commented 4 years ago

I added #210 for some recommendations. So far I spent a few hours reviewing and adding input to the spreadsheet. If an online spreadsheet is prefer to adding Issues, let me know. I also made this doc to summarize meta considerations/method. Will continue as able.

lewismc commented 4 years ago

Excellent @rrovetto thank you so much. I'll incorporate these into the solution when I get a minute.

brandonnodnarb commented 4 years ago

@lewismc are the previous iterations of SWEET available on COR? i ask because, from memory, there were previously Wikipedia definitions in SWEET. I'm not sure which version, or why they were removed, but they may be useful for disambiguation in this task.

brandonnodnarb commented 4 years ago

Just had a look --- apparently I still have a local copy of at least a few previous versions of SWEET :) On a quick grep through, it appears that there was text in the rdfs:comment tags/files with a [Wikipedia] reference (textual; no link). Stats attached (tsv using txt file type for github conformance).

There is no link, version, or any other information, but there is text which could be compared using a similarity function which may, at minimum, rule out the non-domain faff.

Does this help, or hurt? :) SWEET_wikipedia_refs_stats.txt

lewismc commented 4 years ago

@brandonnodnarb

are the previous iterations of SWEET available on COR?

Yes, for any given resource just navigate to the versions pulldown and click on whichever version you wish to view : )

lewismc commented 4 years ago

Hi folks, any further comments here? Thank you

brandonnodnarb commented 3 years ago

It's still on my radar. Haven't had time to dig into it properly.

pbuttigieg commented 3 years ago

Discussed on today's SemTech call.

General feeling - as long as this doesn't overwrite the work done by the Semantic Harmonization and the issue of label/domain matches (see below), then we can move forward.

Things should be clear as long as we're clear who (e.g. Wikidata, ENVO) is making the definitional claim (e.g. by annotating the definition annotation property).

The issue of a lack of domain matching in favour of simple label matching presents a major issue - some are simply wrong. Attempting to match class hierarchies in Wikidata / SWEET is likely to be helpful - note the semantics here are constrained to structured labels.

Suggestions to split this PR into a realm-by-realm task may be better to focus work and spot issues in a contained space. There should be a human review process involved to curate the auto-population, or the definitions should be kept in an experimental or development branch. This branch can be coupled with a pre-release version (e.g. like the OBO *-edit.owl files)

lewismc commented 3 years ago

Now that #211 is resolved. I will revisit this issue and update the conflicts.

brandonnodnarb commented 3 years ago

If interested, we could sort out #218 first and then work on developing a more intelligent filter for matching and generating results.

lewismc commented 3 years ago

Yes Brandon... excellent idea --

Lewis Dr. Lewis J. McGibbney Ph.D, B.Sc Skype: lewis.john.mcgibbney

brandonnodnarb commented 3 years ago

218 is now closed and the PR (#246) merged with master, I'd like to revisit the approach. This likely has broader implications related to automating (or semi-automating) adding definitions from multiple resources.

Related to #225

lewismc commented 3 years ago

Nice work @brandonnodnarb I'm happy to close this one off and regenerate the PR. I'll try and get a PR together over the weekend.

lewismc commented 3 years ago

I think we should also push 3.5.0 once we have merged this into master branch.