ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #205

Closed lewismc closed 4 years ago

lewismc commented 4 years ago

This pull request supersedes #203

@graybeal regarding the definitions. I simply removed the skos:definition's for those ones.

There was only one entry which required a skos:historyNote this is as follows

###  http://sweetontology.net/phenCryo/GlacierRetreat
:GlacierRetreat rdf:type owl:Class ;
                rdfs:subClassOf :GlacialProcess ,
                                <http://sweetontology.net/phenSystem/Retreat> ,
                                <http://sweetontology.net/procStateChange/Melting> ;
                rdfs:label "glacier retreat"@en ;
                rdfs:seeAlso <https://github.com/ESIPFed/sweet/issues/185> ;
                skos:closeMatch <http://purl.obolibrary.org/obo/ENVO_01001656> ;
                skos:definition _:genid5 ,
                                _:genid6 .

_:genid5 dcterms:created "2020-04-09T10:20:12-08:00Z"^^xsd:dateTimeStamp ;
          dcterms:creator <https://orcid.org/0000-0003-4091-6059> ;
          dcterms:source <https://orcid.org/0000-0003-4808-4736> ;
          rdfs:comment "The process of glacier ice loss."@en ;
          <http://www.w3.org/ns/prov#wasDerivedFrom> <http://purl.obolibrary.org/obo/ENVO_01001656> ;
          skos:historyNote "Native curated definition by ESIP Semantic Harmonization Committee."@en .

_:genid6 dcterms:created "2020-07-20T17:20:42.890"^^xsd:dateTime ;
          dcterms:creator <https://orcid.org/0000-0003-2185-928X> ;
          dcterms:source <http://www.wikidata.org/entity/Q94706497> ;
          rdfs:comment "shrinking of a glacier"@en ;

Finally, @graybeal @brandonnodnarb @rrovetto see the requested generated CSV file.

lewismc commented 4 years ago

There appear to be some files which are failing to write... I am investigating those right now. Also, I noticed that a few other files are failing due to host connection timeout issues... this may have to do with OWLAPI or the host or the process... I am not sure.

lewismc commented 4 years ago

Carried over from #203 from @dr-shorthair

Attempting to load in TopBraid so I can run the SPARQL: a lot of errors from mis-formatted xsd"dateTime and xsd:dateTimeStamp :-( in phenCryo and realmCryo ...

I'll go ahead and fix that. Good catch.

lewismc commented 4 years ago

@dr-shorthair right now the annotation looks as follows

_:genid1 dcterms:created "2020-07-20T17:03:26.420"^^xsd:dateTime ;

I'll go ahead and change these to the following

_:genid1 dcterms:created "2020-07-20T17:03:26.420-07:00"^^xsd:dateTimeStamp ;
brandonnodnarb commented 4 years ago

Yes, but realmCryo was last edited 2 months ago...

Protege 5.5 doesn't throw an error, but the definition shows as blank brackets in the editor. Would the zulu time encoding from the Cryo group cause a clash? That's an odd error.

brandonnodnarb commented 4 years ago

Looking at the spreadsheet (many thanks @lewismc) it looks like abbreviations are matching to genetic elements. For example, sic, which is an equivalent class to Standard Industrial Classification (and, IMHO, should probably be a skos:altLabel), is finding a wikidata match as "genetic element in the species Drosophila melanogaster"@en

It looks like any definition starting with "genetic element in the species..." can probably be disregarded. EDIT: there are 9. :)

lewismc commented 4 years ago

@brandonnodnarb thanks for taking a look. Regarding abbreviations yes I am +1 for adding clarifying axioms as you suggest. This is a bit difficult ti implement automatically though... I don't know how I would do that.

It looks like any definition starting with "genetic element in the species..." can probably be disregarded.

I can implement this check pretty easily. I'll go ahead and do that. Essentially, it just means that these definitions will be dropped.

dr-shorthair commented 4 years ago

I got a bunch of reports where

TB might be finding some stuff in imports, but diagnostics a bit lacking.

lewismc commented 4 years ago

@dr-shorthair

there was a trailing 'Z' after a time-zone offset - it is one or the other, not both!

I think this is a bug and should be addressed in a separate pull request. Are you able to submit that one?

there were some with spaces embedded - but I can't find them now.

OK, I've not experienced this one!

rrovetto commented 4 years ago

Protege 5.5 doesn't throw an error, but the definition shows as blank brackets in the editor.

Likewise--it also only displayed blank brackets when I tried.

brandonnodnarb commented 4 years ago

Regarding abbreviations yes I am +1 for adding clarifying axioms as you suggest. This is a bit difficult ti implement automatically though... I don't know how I would do that.

Apologies, this ^ was snark -- I have mentioned this before but haven't had time to address it. (I need a sarcasm font.) I think I can write a simple filter for to extract the subsets.

Anwyay. Assuming #207 fixes the time stamp issue(s), I think this is good to go. There are definitely some definitions that don't seem correct, but they aren't obviously wrong --- aside from the "genetic element" defs.

Assuming nothing else breaks, I think it's a good start. :)

smrgeoinfo commented 4 years ago

incorrect mapping of geologic time intervals (from the csv dump)

sweet:stateTime/Age | age | http://www.wikidata.org/entity/Q185836 | "period of life of a human or organism"@en | should map to https://www.wikidata.org/wiki/Q568683

sweet:stateTime/Epoch | epoch | http://www.wikidata.org/entity/P6259 | "epoch of an astronomical object coordinate"@en should map to | https://www.wikidata.org/wiki/Q754897

sweet:stateTime/Period | period | http://www.wikidata.org/entity/Q101843 | "row in the periodic table of elements"@en should map to https://www.wikidata.org/wiki/Q392928

smrgeoinfo commented 4 years ago

lots of other incorrect mappings, particularly for commonly used words, some examples:

I spent about 45 minutes scanning through the csv file , and found 138 definitions that are obviously wrong or need review; I looked at maybe a third of the rows. The really technical terms for the most part got reasonable matches. The marked up spreadsheet is available, problem defs are highlighted in yellow.

lewismc commented 4 years ago

@brandonnodnarb no problems ;)

Great @smrgeoinfo some comments from you

sweet:stateTime/Age | age | http://www.wikidata.org/entity/Q185836 | "period of life of a human or organism"@en | should map to https://www.wikidata.org/wiki/Q568683

... I will update these 3 manually in the next iteration.

lots of other incorrect mappings, particularly for commonly used words, some examples... problem defs are highlighted in yellow.

I'll go ahead and manually remove these incorrect annotations. We can address them in future work.