ESIPFed / sweet

Official repository for Semantic Web for Earth and Environmental Terminology (SWEET) Ontologies
Other
115 stars 33 forks source link

Natural language definitions/comments missing after v2.3 #218

Closed brandonnodnarb closed 3 years ago

brandonnodnarb commented 3 years ago

Previous versions of SWEET --- v2.0 to v2.3 --- had natural language descriptions in an rdfs:comment tag. Many of these include a general citation to Wikipedia, i.e. trailing text [Wikipedia] without a direct page link.

Results of a SPARQL Query for rdfs:comment in v2.3 only in the attached txt file containing two fields: IRI -> rdfs:comment_text SWEET_v2.3_URI-comments.txt The file contains 1119 lines; there may be duplicate entries.

As per the discussion in #208 and #211, it would probably be prudent to carry these over to the current version where applicable (e.g. non-Cryo terms). Even if these natural language descriptors aren't well cited they were manually added by previous developer(s); at minimum they provide further information which can be leveraged to improve the accuracy of automated mapping methods.

Thoughts? Comments?

brandonnodnarb commented 3 years ago

(adding to the discussion previously on slack)

I had thought a SPARQL update query could do the trick. something along the lines of:

`PREFIX : <http://sweetontology.net/sweetAll#>
 PREFIX SWEETv23: <http://sweetontology.net/sweetAll_version=20171004T160715#>
 SELECT ?sub ?com
 FROM <http://sweetontology.net/sweetAll_version=20171004T160715#>
 WHERE
 {
      GRAPH ?g { ?sub rdfs:comment ?com }          
  }
 COPY GRAPH TO DEFAULT`

(I have not tested --- for syntactic validity or accuracy).

If one were to try and run the query 'directly' via COR, I'm not sure how to specify the SWEET version in the query. I'm also assuming an admin would need actually issue the query. There may be a better way...

brandonnodnarb commented 3 years ago

I have 1053 rdfs:comments from v2.3. I have created a one-off PR to gather feedback before processing the lot.

As these are from a previous version (and dropped somewhere along the line), I'm inclined to add them back with the understanding that any rdfs:comment not reified, or properly cited, is a holdover and needs to be verified and validated.

Please keep in mind the main reason these comments would be useful, in addition to contextualizing some of the entities, is to help automate matching algorithms e.g. #225, #208, and similar. I would expect these comments to drop off in future as they are replaced with cited material.

Thoughts?

brandonnodnarb commented 3 years ago

resolved with #246