NCEAS / adc-disciplines

Discipline taxonomy derived from re3data/DFG subject classification
1 stars 1 forks source link

Feedback on discipline class hierarchy and ontology #4

Closed amoeba closed 2 years ago

amoeba commented 2 years ago

@mbjones asked me to look over the latest copy of the disciplines ontology (ADCAT). I looked at whether the class hierarchy makes sense and I also looked at the ontology itself.

Class hierarchy notes

  1. I see a class "General Genetics". Why not just "Genetics" here?
  2. Should we toss in an "Evolutionary Biology" class under "Biology"?

My feedback here is pretty superficial since I'm not super familiar with the breadth and depth of submissions the ADC is managing nowadays.

Ontology notes

  1. Do we want more annotation properties on these classes right now? I know it'd take time to work up definitions for everything, for example.
  2. I'm seeing some funny stuff when I open things in Protégé. Namely the Ontology Header section is empty. When I look at the TTL file I saw it uses the odo prefix which is defined as @prefix odo: <.> ..

I went about fixing that and Protégé was much happier. Let me know if that makes sense and I'll toss this patch up:

diff --git ADCAT.ttl ADCAT.ttl
index 84b5c31..bd298a3 100644
--- ADCAT.ttl
+++ ADCAT.ttl
@@ -1,12 +1,13 @@
 @base <https://purl.dataone.org/odo/ADCAT_> .
 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
-@prefix odo: <.> .
+@prefix odo: <https://purl.dataone.org/odo/> .
 @prefix dc: <http://purl.org/dc/elements/1.1/> .
 @prefix obo: <http://purl.obolibrary.org/obo/> .
 @prefix owl: <http://www.w3.org/2002/07/owl#> .
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 @prefix terms: <http://purl.org/dc/terms/> .
 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
+@prefix adcat: <https://purl.dataone.org/odo/ADCAT_> .

 odo:ADCAT_
     terms:created "2021-11-10"^^xsd:date ;
@@ -16,324 +17,323 @@ odo:ADCAT_
     terms:title "Arctic Data Center Annotation Terms Ontology (ADCAT)" ;
     owl:versionIRI <ADCAT/0.2.0> ;
     owl:versionInfo "Version 0.2.0" ;
-    <rdf:type> owl:Ontology .
+    rdf:type owl:Ontology .

-odo:ADCAT_00000
+adcat:00000
     a owl:Class ;
     rdfs:label "Academic Discipline" .

-odo:ADCAT_00001
+adcat:00001
     a owl:Class ;
     rdfs:label "Humanities and Social Sciences" ;
-    rdfs:subClassOf odo:ADCAT_00000 .
+    rdfs:subClassOf adcat:00000 .

[patch above is truncated since the rest of it repeats]

mbjones commented 2 years ago

Thanks, @amoeba.

I agree about the "General Genetics" being better labeled as "Genetics". Most of these names/labels came directly from the re3data vocabulary, but we should relabel them to make sense to us. I did add an Evolution entry which was missing from re3data, so you might have missed it at the end of the file. Maybe it should be relabeled "Evolutionary Biology".

odo:ADCAT_00066
    a owl:Class ;
    rdfs:label "Evolution" ;
    rdfs:subClassOf odo:ADCAT_00011 .

Regarding the ttl file validity, I built it using @cboettig's rdflib package, which in turn used redland under the hood. Rather than modifying the ontology directly, it would be good to figure out what needs to change in the create_rdf() function in adc-disciplines.R so we can automate the process of creating new versions. I'm a little surprised it wasn't valid out of the box, so that's something to track down.

And yes, I think we need more annotation properties. I hesitated because didn't know how to "Define" the disciplines beyond what is in the label. For example, the definition of the subclass of Academic Discipline labeled Genetics might be something like "A discipline focused on the study of genetics." Not sure if that is actually helpful or clarifying beyond the label. The boundaries among disciplines are incredibly fuzzy, so I don't think precise definitions are actually possible. I would also like to add a property relating these back to the re3data term identifiers.

amoeba commented 2 years ago

Turned out to be a small change to make the serialization look good and Protégé happy: https://github.com/NCEAS/adc-disciplines/pull/5. I still see a weird line in the TTL:

@prefix odo: <.> .

which I can't fix no matter what I do. It looked like an issue with defining the odo prefix twice in the namespaces list but not doing that didn't fix things. Might be an edge case in rdflib/redland?

The ttl file in the PR loads fine and looks good in Protégé.