ExPaNDS-eu / pan-ontologies-api

BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Add disjointWith parsing #79

Closed minottic closed 2 months ago

minottic commented 2 months ago

OWL file now has disjointWith keyword which requires parsing

paulmillar commented 2 months ago

I don't know if this helps, but with PaNET commit dfce0a0d9d, we are now generated both the old-style output and the output having run it through a reasoner.

You should find this "reasoned output" as an artefact from the CI/CD pipelines: they are kept for some 7 days.

The reasoned output isn't part of the v1.1.0 release asserts, but we should include it as part of the next release's set of assets.

minottic commented 2 months ago

can I see a reasoner output example anywhere? Probably going forward it will make sense to discontinue this service and use an existing owl parser, if it exists. But for now, how can I know if some extra owl tags are used, similar to this disjointWith case? In the past I could simply do a commit comparison with the .owl file

paulmillar commented 2 months ago

Perhaps the easiest way to get the reasoner output is to use the CI/CD-generated artefact.

The general procedure is:

  1. go to PaNET main page
  2. Click on the green tick near the top of the page, showing the CI/CD result of the most recent commit.
  3. Against the "panet-build / build (push)" row, select "Details" link.
  4. Select " Summary" link (from left-hand side navigation bar).
  5. Under "Artifacts", select the "PaNET" artefact. This will trigger the download of the PaNET.zip file.
  6. Within PaNET.zip file, the file PaNET_reasoned.owl contains the reasoned output.

For the most recent commit, here is the PaNET.zip file. However, this link will likely break, soon.

minottic commented 2 months ago

mmm the reasoned looks very similar to the plain .owl, what's the main advantage to use that? So, apart from downloading the owl and doing the diffs by hand, you don't plan to keep the commited versions of the owl file, apart from attaching to releases?

paulmillar commented 2 months ago

mmm the reasoned looks very similar to the plain .owl, what's the main advantage to use that?

The "reasoned" version includes the implied relationships by running the reasoner. This is currently limited to additional subClassOf via RDFS entailment, which (I believe) is no big deal for the ontology-api service. Nevertheless, running a reasoner provides a non-trivial increase of subClassOf relationships, from 718 to 3168 (some 4.4 times as many).

In the future, more complex relationships may be introduced. For example, we hope to identify the term hierarchy implicitly, via the reasoner, rather than identifying the relationships explicitly (as we do currently). Providing the reasoned version of the ontology would allow software to use these relationships without the service running the ontology through a reasoner itself.

Just to be clear, this is speculation on how the ontology will evolve. There's no timeline when the reasoned version will include anything other than additional assertions from RDFS entailment.

So, apart from downloading the owl and doing the diffs by hand, you don't plan to keep the commited versions of the owl file, apart from attaching to releases?

No, we don't plan to provide any version of PaNET via the git repository.

Just to explain ...

We now have a script for generating PaNET from the input csv file and some ontology-targeted metadata (e.g., who contributed, what the ontology is called, etc). Someone can use the script to generate the output at any time, including locally checked out versions with manual changes. There is also a docker image, to make this process easier.

The generated output is available as part of the CI/CD verification process (via a GitHub workflow). For consistency, the workflow uses the docker image. Changes are submitted as pull-requests and checked: we want to make sure we don't make changes that break the script, as it is part of our release process.

The GitHub workflow artefacts (CI/CD) are time limited, per GitHub policy. Therefore, these are not intended as a way for people to obtain PaNET.

The "correct" way is either to download the artefacts from a release, or to use the stable Persistent URL https://purl.org/pan-science/PaNET/PaNET.owl This resolves to the PaNET.owl artefact from whichever GitHub version is tagged latest.

minottic commented 2 months ago

so the reasoned also has the grandchildren computed and annotated with subclassOf grandparent?

paulmillar commented 2 months ago

so the reasoned also has the grandchildren computed and annotated with subclassOf grandparent?

Exactly.

This is the RDFS entailment: subClassOf is transitive and the reasoner provides the transitive closure of these relationships.

The next PaNET version should provide both a "concise" and a "reasoned" version of PaNET. For some software (like ontologies-api) there's currently no big difference. This is because we're largely limited to RDFS entailment and (IIRC) ontologies-api implements support for RDFS entailment.

However, in the future, we might use more OWL features (moving away from only using RDFS relationships) to express more complex relationships in a manageable way. When doing this, the distinction between the "concise" and "reasoned" versions will likely matter to ontologies-api.

minottic commented 2 months ago

ah thanks! So that might be quite useful indeed, I have been thinking that pan-api should rely on existing reasoners rather than parsing the owl itself. Would it make sense to make the ...reasoned.owl file part of the release assets?

paulmillar commented 1 month ago

pan-api should rely on existing reasoners

Yes, I think this would be a good idea. But, it's easy for me to say this, when I don't have to do the work!

Would it make sense to make the ...reasoned.owl file part of the release assets?

Yes, I would say so. The the primary reason for updating the build container (to run the result through a reasoner and save the result) was to include the reasoned output as a release asset.

I'd say that, assuming it isn't urgent, we can do this as part of the next PaNET release.