INCATools / ontology-development-kit

Bootstrap an OBO Library ontology
http://incatools.github.io/ontology-development-kit/
BSD 3-Clause "New" or "Revised" License
212 stars 53 forks source link

Proposed change of ontology IRIs and version IRIs #1037

Closed gouttegd closed 2 months ago

gouttegd commented 3 months ago

Currently, for a given ontology ont, the release artifacts have an ontology ID of the form:

(assuming a default value for ONTBASE).

I am not sure this is ideal. I’d like to propose to change the ontology IRI to always be http://purl.obolibrary.org/obo/<ont>.owl, so that it does not include the variant information.

Rationale:

The ontology ID (ontology IRI + version IRI) should provide three distinct pieces of information:

Currently, we do not provide the first bit at all. Instead, we waste the ontology IRI field by repeating in it an information (the variant) that is already present in the version IRI. The only way to get the name of the ontology itself is to perform some manipulation on the ontology IRI (to transform http://purl.obolibrary.org/obo/<ont>/<ont>-<variant>.owl into http://purl.obolibrary.org/obo/<ont>.owl).

I believe it would be more useful to use the ontology IRI to indicate only which ontology it is, and to mention the variant and the release only in the version IRI.

gouttegd commented 3 months ago

(There is also the issue that it’s not great that the filename extension is part of the ontology IRI, but that’s another problem and it may be way too late to change that.)

cmungall commented 3 months ago

I think that was a mistake. As was not using the namespace as the ontology id.

As far as not including the variant, would we not consider the basic variant to be a different semantic artifact? Wouldn’t conflating these conflict with the follow-your-nose principles of linked data?

But I see the motivation. Really I don’t think this part of OWL is designed very well.

But perhaps the best workaround is to introduce a new OMO property to represent the logical name/iri of the ontology.

On Sun, Mar 31, 2024 at 5:09 AM Damien Goutte-Gattat < @.***> wrote:

(There is also the issue that it’s not great that the filename extension is part of the ontology IRI, but that’s another problem and it may be way too late to change that.)

— Reply to this email directly, view it on GitHub https://github.com/INCATools/ontology-development-kit/issues/1037#issuecomment-2028682404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIL2SQKB4K7OWSNGIDY274JLAVCNFSM6AAAAABFQLSZWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGY4DENBQGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gouttegd commented 3 months ago

As far as not including the variant, would we not consider the basic variant to be a different semantic artifact?

It is certainly a different artifact, but it’s still the same ontology (“ontology” here in the sense: the same project).

The same way as http://purl.obolibrary.org/obo/cl/releases/2024-02-13/cl-base.owl is a different artifact than http://purl.obolibrary.org/obo/cl/releases/2024-01-04/cl-base.owl, and yet they both have the same ontology ID (http://purl.obolibrary.org/obo/cl/cl-base.owl). They are different versions, but they are still the same ontology.

What I propose is using the ontology IRI to refer strictly to the overall ontology name (represented as the IRI of the canonical release file, so http://purl.obolibrary.org/obo/cl.owl for example), and to relegate the variant and version informations to the version IRI – instead of having the variant bit being present in both.

cmungall commented 3 months ago

Broadly in favor but need to think through whether we are making things even less intuitive for users. Right now I can always take the ontology IRI and do a curl and get the latest version of that variant. To support this use case we need a new OMO field.

(arguably this is already the case for formats)

Minor: What about release artefacts like subsets?

FWIW here is what I think we should be aiming for mid to long term (but others may not be on board). This is based on observatons that merged files are a confusing hack that throw off the majority of non-OBO elite users.

Base files should be ubiquitous and the default. Ultimately there will be no need for distributed merged products in the future, and if people really need them it should be possible to generate them in milliseconds using modern tools.. Tools learn do The Right Thing with base products (not show danglers in hierarchical displays at a minimum; or intelligently resolve dynamically).

With this there is really no need for variants that are not explicit subsets or supersets

matentzn commented 3 months ago

I am not convinced by either of your positions here. I think the fact that the ontology iri (not just version iri) points to the latest version of the variant is a very good idea and at the heart of so many pipelines we have build over the year (the use_base: TRUE being one of many examples).

I personally think its easy enough to grep the ontology id from the current PURL string. Is this the main argument? That there is no one property that allows us to see what "family of ontologies" a specific variant belongs to? In this case, I would suggest we create additional properties instead of repurposing the ontology iri. What other arguments?

Also, I absolutely do not agree that we dont need variants. Simple/Basic/Base/International etc serve important purposes even if you dont believe in the concept of "full" (which I personally think is still useful for some use cases).

Not trying to block here, just requesting stronger arguments with whats wrong with the current situation and filing some mild objection against changing the behaviour of some widely used properties.

gouttegd commented 3 months ago

! think the fact that the ontology iri (not just version iri) points to the latest version of the variant is a very good idea.

I disagree. I think it is ridiculous to consider that cl-base, cl-full, cl-simple, etc., are not part of the same “ontology series” (in the sense of the OWL 2 spec). They are merely different versions of what is fundamentally the same ontology. Sure, they are not temporal versions (as two different releases would be, like http://purl.obolibrary.org/obo/cl/releases/2024-01-04/cl-base.owl and http://purl.obolibrary.org/obo/cl/releases/2024-02-13/cl-base.owl), but they are versions all the same. They should be distinguished by their version IRI, not their ontology IRI.

For context: What I want to do is automatically fill the subject_source SSSOM field when generating a mapping set from cross-references. In an ideal world, all ontologies out there would annotate all their classes with rdfs:isDefinedBy, and I could then just use the value of that annotation to get the subject_source.

Alas, at least as far as the use of rdfs:isDefinedBy is concerned, we are not in an ideal world (actually, rumour has it that we are not in an ideal world for plenty of other reasons as well). So when rdfs:isDefinedBy is not available, I’d want to just use the ontology IRI as a fallback. That is, if I am making a mapping between, say, CL:1234 and OTHER:5678, based on a cross-reference carried by CL:1234, I’d set the subject_source to the ontology IRI from which I extract the cross-reference.

But if I do that, and I happen to use something else than the “canonical” release product (the one that has the variant-less http://purl.obolibrary.org/obo/cl.owl ontology IRI), I end up with a wrong subject_source. For example, if I happen to work with the cl-base.owl file (which is what I am most likely to do, since we tend to use base files as much as possible), I end up saying that the source of CL:1234 is http://purl.obolibrary.org/obo/cl/cl-base.owl, which in my opinion is flatly wrong. The ontology a class belongs to should not depend on which artifact is used. CL:1234 belongs to CL (that is, to http://purl.obolibrary.org/obo/cl.owl), regardless of whether I got its definition from the -base, -simple, -full, or -whatever variant.

I personally think its easy enough to grep the ontology id from the current PURL string.

Sure, it’s easy. But it’s also a needless kludge, one that I am reluctant to bake into SSSOM-Java.

If we have to grep an IRI to get the informations we need, I’d rather that we grep the variant out of the version IRI and that we leave the ontology IRI without any extra bit.

I would suggest we create additional properties instead of repurposing the ontology iri.

I can already imagine how that would pan out… By 2026, we will agree that an additional property is indeed needed. By 2028, we will agree on where that property should live and how it should be named. Nothing will happen in 2029 because someone will suddenly object to creating the property. By 2030, the property will finally be available and if we are lucky, by 2035 15% of our ontologies will have adopted it. /s

Seriously though, if we go through the “new property“ route, I would rather have a new property to store the variant and repurpose the ontology IRI to its original meaning.

balhoff commented 3 months ago

@gouttegd I think you make some interesting points... this has bugged me for a long while. I have wanted to be able to say that by using http://purl.obolibrary.org/obo/cl/cl-base.owl I am fulfilling the need for a http://purl.obolibrary.org/obo/cl.owl. However, I do think the spec is indicating temporal versions when they say "series":

In each ontology series, exactly one ontology version is regarded as the current one.

matentzn commented 2 months ago

@gouttegd I see your points.

Here are my primary concerns:

  1. We have been much faster with adding new properties and rolling them out then in the past. We can do it in 6 months for all ODK managed repos
  2. How much infrastructure will break if we suddenly change the way we assert ontology IRIs? There is so much to consider.
    1. People that use ontology iris in owl:imports statements. Its not out of the question that http://purl.obolibrary.org/obo/cl/cl-base.owl and friends are used.
    2. Tools that use the ontologyIRI in the ontology to obtain a link to download a newer version of the file will break
    3. Tools that use the ontology IRI to document the file source will not work
    4. I do understand that the above dont sound right, but this is like Chris thinking that axioms more complex then OWL EL should not exist - they do, and therefore we should be a bit careful not to break stuff in a generic tool that should make like easier for everyone
      1. Lastly, variants are often very (very) different from each other. Different axioms, different signature, different everything. Different labels! (in the case of, say, the French release of HPO)

All that said, I am not unsympathetic.. But we should not do this unilaterally if we do it. I also seem to remember some weird thing with catalog.xml and the ontology iri (somehow I remember the owl import URL must be the same as the ontology iri for the catalog referencing in Protege to work, but I am only 51% sure my memory is not betraying me).

gouttegd commented 2 months ago

How much infrastructure will break if we suddenly change the way we assert ontology IRIs? There is so much to consider […]

That is pretty convincing. OK, I don’t like it, but I will agree that it’s probably best to leave the ontology IRI as it is.

As for using a new property to store the real name of the ontology: that’s the next best solution, but I completely lack the motivation to argue for such a property to be created. If anyone wants to tackle that, be my guest.

matentzn commented 2 months ago

Followed up on it here: https://github.com/information-artifact-ontology/ontology-metadata/issues/171