ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #203

Closed lewismc closed 4 years ago

lewismc commented 4 years ago

PR replaces #202

Please let me know feedback.

@graybeal there should be no conflicts on the definitions.

pbuttigieg commented 4 years ago

Are the dcterms:creator annotations (e.g. here) at the class level or the definition level? In either case, it's somewhat misleading unless @lewismc created the classes de novo.

Otherwise, this is a pretty good SKOS level solution to filter Wikidata for the SWEET user base.

However, I think the wikidata defs should be superceded once we get expert input / definitions from activities like the Semantic Harmonization Cluster's cryohackathons (@rduerr). We need a way to identify those activities as definition sources.

graybeal commented 4 years ago

@pbuttigieg the annotations are at the definition level, I'm not sure why you think that's misleading? The definition is declared with a blank node, the blank node has 4 annotations including dcterms:creator, it seems clear to me it applies to the definition that was previously declared. (Otherwise its subject would be :HumanActivity, right?)

On the other hand, I'm not sure why all the prefixes were replaced by just the colons, I guess just one more default parsing pattern simplification to master…? It actually make things a little less clear to me, but if it works for others I'll come around.

I do agree that the cryohackathon activity needs its own identifier to use as the source for any definitions it supplies, that will be a big win! But I don't agree that wikidata defs should be superceded unless there is a consensus that they are faulty.

The argument I am putting forward is that SWEET is not going become authoritative as to what is the "best" definition. Even when definitions come from experts there are sooner or later going to be other experts that have created their own favorite way of defining the world. Even if you think having "one answer to rule them all" is the better principal, as a practical matter I don't think the SWEET team will be spending time wisely to put itself in the role of adjudicating which sources, including which expert teams, should have their definitions supercede other definitions.

graybeal commented 4 years ago

Lewis, big picture I'm ready to approve (or whatever it is) the change.

But at a detailed level, after 10 minutes I've found two individual cases that need rejection, and one that is arguable:

I think given that many issues in 1.2 files, it is best to wait, so as to not include too many conspicuously wrong entries.

It's hard to parse in the existing format, so I can't really justify another 5 hours or so reviewing these one at a time. But if someone can put them into a spreadsheet (just need concept IRI and definition strings), maybe on Google sheets is best, I could probably review all of them in an hour, and mark them up for rejecting or further evaluating. And it would make it easy for others to check my work.

(Sorry, I could maybe build that table in an hour, or maybe not, but I need to get some other stuff done for a little bit.)

lewismc commented 4 years ago

On the other hand, I'm not sure why all the prefixes were replaced by just the colons, I guess just one more default parsing pattern simplification to master…? It actually make things a little less clear to me, but if it works for others I'll come around.

I kinda agree here as well. This shorthand writing method is default in OWLAPI Java codebase as well.

You may have also noticed that some prefixes are removed...I'm working on addressing some of these issues but that is another issue.

brandonnodnarb commented 4 years ago

Thoughts from an admittedly quick review: 1) this is pretty rad.

2) The lack of prefixes should be a ttl shorthand for base. I.e. no need to prefix self. This seems to be consistent from my spot check, but I have not looked thoroughly.

3) It may be appropriate to use skos:related for the automated linkage, with a more refined semantics --- e.g. skos:closeMatch or skos:exactMatch (or whatever) --- reserved for verified relationships. I'm assuming that any/all verification is going to be a human task, at least for the time being, and this could be an easy way to find/query "verified" definitions versus automated linkages without conflating it with accuracy, efficacy, etc.

4) I propose creating a separate graph/file for contributions --- i.e. contributions.ttl --- which contains all the provenance.... metadata: Contributions, creators, edits, etc.

5) Following from 4, it probably also makes sense then to create a contributors.ttl file where any and everyone can add their desired dc: info, probably similar to what is currently described on the recognition page.

6) Following from 3 and 5, something along the lines of a sweet:verifiedBy relation may also be appropriate. Perhaps as a type of prov:Activity?

7) The matching is entirely based on string matching against the labels, correct? At any rate, it may be beneficial, particularly for verification, if there were a table listing: realm | NL label | wikidata link | wikidata def

If such a table existed it may make verification a bit more straightforward as it can be filtered and read/interpreted without the surrounding...faff.

@lewismc is this type of table/csv easily outputted from your scala code, or is there a need for a separate filtering script or SPARQL query?

brandonnodnarb commented 4 years ago

ach. Ok, so 4-6 in my previous comment should probably be their own issues for discussion. If any of you think they deserve discussion I can add them as issues with at least a minimal proposal.

brandonnodnarb commented 4 years ago

I would approve this for a 'development' branch.

In a related note, would stable and development branches make sense for this, hopefully burgeoning, crowd? :)

EDIT: added the latter here --- #204

lewismc commented 4 years ago

The matching is entirely based on string matching against the labels, correct?

Correct

@lewismc is this type of table/csv easily outputted from your scala code, or is there a need for a separate filtering script or SPARQL query?

I will go ahead and create the table as John also requested it.

rrovetto commented 4 years ago

"But if someone can put them into a spreadsheet (just need concept IRI and definition strings), maybe on Google sheets is best, I could probably review all of them in an hour, and mark them up for rejecting or further evaluating. And it would make it easy for others to check my work."

update: exported the module https://github.com/lewismc/sweet/blob/3f6ede0dfbfd6c6c5e1b47b13011a12c08224c98/src/human.ttl for example but some content wasn't displayed. Trying another way.

lewismc commented 4 years ago

@rrovetto

To confirm, is this the correct source?...

Correct source for what? That is a collection of all of the file-level base URI's for the entire SWEET ontology suite.

dr-shorthair commented 4 years ago

Attempting to load in TopBraid so I can run the SPARQL: a lot of errors from mis-formatted xsd"dateTime and xsd:dateTimeStamp :-(

dr-shorthair commented 4 years ago

in phenCryo and realmCryo ...