Working with owl files and protege

gkoum commented 7 months ago

Editing directly on protege the owl file and produce the csv file as a deliverable seems more straightforward and needs to be discussed:

By editing directly the owl file it is more obvious what are the implications in the ontology. For instance in merge request #107 several additions were made due to the fact that reasoner was run. As a result a huge number of super-classes was added to each class and individuals.
Running the reasoner and possibly some sparql queries (to find duplicated abbreviations as discussed) would allow us to graphically inspect the implications of our modification/additions.
We can produce the CSV file and many more files as our release artifacts using Robot which is a very useful tool.
```
robot export --input nucleus_part_of.owl \
--header "ID|LABEL" \
--export nucleus.csv
```

gkoum commented 7 months ago

Incorporate protege editing if needed in #46.

paulmillar commented 2 months ago

I kinda disagree with this on a few points.

First, it (currently) doesn't matter whether the CSV or the OWL is the authoritative source of information as both are equivalent when generated PaNET. In effect, the CSV file is just a proprietary format for describing the same ontology, much in the same way an RDF/XML file, a Turtle file or an N3 file could serialise the same ontology.

(If we want to include information that cannot be represented by an OBO-ROBOT CSV file then we would likely need to move to direct editing, anyway.)

Second, Protege, although very good software, is not necessarily more intuitive than editing an OBO-ROBOT CSV file. Moreover, I would see requiring downloading and learning how to use Protege as a barrier against involving new members. There's a good chance that a random person could simply modify a CSV file with their favourite text editor, while the same isn't true for OWL in RDF/XML or even in Turtle.

Graphically inspect the implications of our modification/additions is certainly a good idea, but this doesn't require that the authoritative source is RDF/XML. This is true even if the ontology comparison tool require PaNET in RDF/XML. Generating the RDF/XML from the OBO-ROBOT CSV is now trivial and the build container could be enhanced so it also supports visualising diffs.

The point about SPARQL queries is another very good one. However, I was thinking this is something we would do by using SHACL and having a set of validation rules that would be applied via CI/CD in GitHub. Again, this doesn't require that the source is RDF/XML.

My opinion is that, in the future, we may find using OBO-ROBOT CSV is more cumbersome than using Protege, and switch to storing PaNET as RDF/XML, but I don't think we're there, yet.

gkoum commented 2 months ago

Paul you are right that it has mainly to do with what people are used to. As PaNET evolves we will probably have to use more advanced tools like protege which allows you to see the big picture of the ontology as well as the implications of your modifications. It is true though that we can stick to the existing CSV approach since it seems to be enough for now. I would suggest that the working group of PaNET could start using protege in parallel for now for its visualization and advanced querying capabilities.

ExPaNDS-eu / ExPaNDS-experimental-techniques-ontology

Working with owl files and protege #109