In IRIs, use opaque identifiers instead of english labels

alanruttenberg commented 3 years ago

OBO Policy was designed for good reasons.

First, by using interpretable labels you potentially alienate or confuse users in different communities where terms are known by different names.

Second, we want our ontologies to be used worldwide, and using english in IRIs is not welcoming to non-english speakers. The sanctioned mechanism for providing user readable labels is to use rdfs:label or skos properties, and literals with language tags.

Third, there will inevitably be cases where words are spelled wrong, or disputed, which makes for pressure to "fix" the IRIs. Unfortunately, such fixes are typically breaking changes to users.

neilotte commented 3 years ago

Strongly agree.

The main argument for keeping English-readable labels in the IRI is that it makes it easier for developers to recognize a URI on sight. However, it only requires a little re-orientation (e.g. an extra line in a SPARQL query to return labels) to get around this, and if this is really insufficient, then a local mapping could be applied to give all opaque identifiers readable URIs within a local environment. The utility of recognizing URIs at a glance is also undermined, to some degree, when the words in a URI are ambiguous (e.g. 'Document' referring both to a noun and a verb--documenting), and developers navigating the ontologies by URI alone may inadvertently introduce errors in this way.

There are different strategies for versioning ontologies, but I think this also greatly improves trust in each release. If the extension of a URI changes between versions, it is much easier for end users to have this URI deprecated and replaced, rather than require the end user to treat each URI as distinct from the last w.r.t. each release of the ontology with a new version IRI. However, if there are human readable labels in the URI, then there is considerable pressure to maintain the URI. This means IRIs in subsequent versions of the ontology can't really be trusted by default, since their extensions may have changed between versions while the URI remained the same.

nklsbckmnn commented 3 years ago

I couldn't agree more.

harefb commented 3 years ago

We agree that by using an alpha-numeric IRI similar to BFO we increasing the ability for an international adoption of the standard. This is important from a DoD perspective when encouraging NATO and other non-English speaking allies to conform to the standard. We also agree that it reduces the potential for ambiguity (and endless debate) on the term labels if we already agree on the concept being captured by the entity and its place in the taxonomy.

However, we are concerned that the alpha-numeric IRIs make it difficult for users to work with some tools that display the IRI only (e.g., ONTOP) in their user interfaces so we appreciate efforts to reduce the related burdens to adoption.

Once we get some lessons learned from this process, we will probably follow suit with the DICO.

mark-jensen commented 3 years ago

I am on the fence as to the value of switching. I can see both sides, perhaps with some prefernce towards the use of numerics. There will be a fair bit of upfront work in making the switch, e.g., to workflows and development tools, visualizations based on IRIs vs labels, etc. If agreement is reached, I will advocate that enough time is provided to test and validate before we fully adopt the new IRI standard.

Some initial thoughts on consequences for existing user-applications:

Actively used applications may not need to be updated, providing their data models are stable, no or minimal ongoing development, and no integration is happening with external data sources that use the new IRIs. If the later is the case, assuming the external sources are not using new content from CCO (a dangerous assumption), then perhaps a simpler and more cost effective solution could be to create a separate mapping to integrate the new IRIs, rather than updating the legacy code to use new IRIs. I guess cost/benefit would depend on the complexity of the application.

If an actively used system is in development, ie., new data sources being mapped, content updates desired, then updating to use new IRIs is necessary. However, a potential solution that doesn’t require updating code to use the new IRIs, is to take new releases of CCO with updated IRIs and covert them back to IRIs that use natural language, mirroring the old IRI style and thus matching its use in the legacy code. This goes a step further than a mapping as noted above. It uses a mapping to transform the ontology before it is embed into code for development, essentially creating something akin to a IRI-normalized version of CCO, one that allows new releases to merge with unmodified code that used the old IRIs.

cco:CCO_00002021 [Act of Communication] >>> cco:ActOfCommunication

Terms introduced since the update, ie., no old IRI existed, could also be straightforwardly converted.

cco:CCO_0023828 [Act of Fostering] >>> cco:ActOfFostering

Thinking about the actual transform of exiting code to use the new IRIs, I am trying to find edge cases that could create problems. I currently don’t see any. It seems that a simple line-by-line search of a file, looking for two IRI patterns (full IRI and prefixed), a key/value mapping of old to new term name/IDs, swap, write to output new file. I suppose that the actual prefix used in some files may differ, can’t assume will always be “cco”, but that’s still fairly simple to work around. Can anyone think of more complex cases where a general script like this fails?

It seems the actual transforms will be easy. But, the testing and validation I think will be more onerous and costly for users. And of course, for users who rely on the name/ID to be readable, e.g, lots of writing of SPARQL, use of viz tools such as KARMA, the change will induce a steep learning curve, perhaps new tooling. That’s the biggest reason I can see for not making the change.

rorudn commented 3 years ago

I agree that the Common Core Ontologies should convert to use opaque identifiers as IRIs on the condition that before a such a version of the CCO be released that there be some effort made on 1) testing conversion scripts on files generated and used by a variety of different applications and 2) reaching out to developers both open source (e.g. OnTop, KARMA) and commercial (e.g. ontotext) to ask for the ability to switch from viewing IRIs to labels in a chosen language.

bdonohue29 commented 3 years ago

Agree both with the goal of switching to alphanumeric IRIs and the need to test/validate/manage the change so as not to unduly break things for CCO consumers.

But this could be done in stages, right? Would the team consider publishing a v2.0 (with alphanumeric IRIs) to run in parallel with the current v1.3? CUBRC could provide the info necessary to map from one to the other (whatever that might look like... e.g. could be an annotation in the ontology itself). But CCO consumers would then have a specified timeframe (e.g. a year) where they could migrate, test/validate, etc, but without anyone forcing them to convert to v2.0 immediately. Just food for thought.

neilotte commented 3 years ago

@rorudn What would be included in "the variety of different applications"? And are there known issues with opaque IDs that have been encountered with OnTop, Karma, or Ontotext? Given how common the practice is of using opaque IDs in URIs, I guess I'd be a little surprised if one of the widely used tools really required human readable IDs presently.

@mark-jensen Regarding "However, a potential solution that doesn’t require updating code to use the new IRIs, is to take new releases of CCO with updated IRIs and covert them back to IRIs that use natural language, mirroring the old IRI style and thus matching its use in the legacy code. " I think that makes sense. If you published a table of such a mapping in the initial release switching to opaque ids, this should be sufficient for anyone requiring human readable URIs to continue doing so for the time being. Sounds like a good solution. (Just a thought too: since the user base is growing, it might be nice to have a mailing-list for CCO that users could hop on or off of. This would allow you to survey your user base, understand their needs, and make announcements like this).

harefb commented 3 years ago

The DICO team likes the above outlined approach outlined by Mark and Brian and Neil's recommendation for how the community might be able to better share lessons learned and tips for dealing with the Opaque IRIs as they are being implemented. We will undoubtedly run into issues as simple as reading a .ttl file directly (which is a useful tool to show people how straightforward the modeling is).

APCox commented 3 years ago

I'm not specifically opposed to changing to alphanumeric IRIs; however, I think that some of the arguments against continuing to use natural language IRIs are not as strong as they may initially appear.

Alan's first and second points are versions of the same claim, namely: human-readable English IRIs are not helpful to some ontology users. While true, the solution in both cases is the same as for alphanumeric IRIs -- use the rdfs:label. For example, http://www.ontologyrepository.com/CommonCoreOntologies/Document currently has the annotation: rdfs:label [language: en] "Document" Any ontology that will be actively used by a group of non-English speakers should also have a complete set of rdfs:label annotations with values from the language in question, e.g.: rdfs:label [language: es] "el documento"

With the exception of users whose primary language uses a non-Latin alphabet, I would contend that the use of English-based IRIs is not specifically less friendly to non-English speakers than alphanumeric IRIs are to everyone. That being said, I would NOT want to try to type out IRIs in Cyrillic, Arabic, or Chinese characters. Additionally, while it is typically easier for English speakers to remember an English word or phrase than to remember a quasi-random 7-digit number, I grant that extra-long IRIs can be cumbersome in their own right simply because of their length in comparison to a standardized 11 character local name (e.g. CCO_0123456).

Regarding different communities using different terms differently, if the local IRI isn't specific enough, hopefully the rdfs:label is. If, however, that also fails to satisfy, that's what we have more specific annotation properties for. Specifically, 'alternative label' is used frequently in the CCO to help address this issue. For example, 'Combustion' includes 'Combustion Process' and 'Burning Process' as alternative labels. If that is still insufficient, users are free to create their own preferred label annotation property to use for their specific project. The point here is that, whichever solution is used to handle community-based terminology disagreements, it will be the same solution regardless of how the IRI is structured.

Alan's third point -- misspelled IRIs -- is a fair criticism that only applies to human-readable IRIs. However, given that every new CCO term must be vetted by a working group composed of highly motivated and detail-oriented volunteers, I doubt that this situation will arise frequently enough for it to outweigh the benefits of human-readable IRIs. Furthermore, depending on the situation, the term can either (a) be deprecated and replaced, or (b) forever remain misspelled (with a corrected rdfs:label if necessary).

Neil argues that ambiguity can undermine the utility of human-readable IRIs. Fair enough, but since this issue also applies to rdfs:labels, the solutions are the same in both cases. Namely, developers should design the IRI and label of each term to be sufficiently unambiguous and should provide quality human-readable definitions for each term. In my experience using OBO Foundry ontologies and reviewing projects that use them, there are inevitably errors caused by users frequently not looking at more than just the term label. For example, when one ontology uses 'Hospital' to represent the healthcare facility, another ontology uses 'Hospital' to represent the healthcare organization, and a user decides that both terms are equivalent in their application. This is a simple example that could be avoided by using more precise labels (e.g. 'Hospital Facility' and 'Hospital Organization'), but avoiding all such problems requires more just a well-designed ontology.

The most compelling argument I've seen against using natural language IRIs is Neil's point about what should happen if we change the meaning of a term significantly enough that we decide to deprecate it. We could choose to keep the IRI and make a note of the change in the release notes, but Neil points out that doing so could cause users who don't check the release notes closely enough to start using the term incorrectly. If instead we deprecate the term, users will be forced to resolve the issue when their models, queries, etc. break due to the obsoleted IRI. This approach is arguably more user-friendly, but it could put the developers in an awkward situation if natural language IRIs are used because term 'X' is now unavailable (at least for the current release) to be used. This means that another, perhaps less than ideal, term must be used instead. I expect that this sort of scenario will occur very rarely for mature ontologies, however it is an awkward situation to be in and it does not affect ontologies that use alphanumeric IRIs.

As has been pointed out by at least Mark, Ron, and Forrest, the main reason for using natural language IRIs is to facilitate the use of the ontologies. Not every semantic tool is currently built to leverage rdfs:label annotations, programmers find it easier to work with meaningful IRIs, and writing queries, mappings, etc. using natural language IRIs is significantly faster/easier. Granted, there are workarounds for at least some of these use cases, but more needs to be done to increase support for developers and users alike.

One such workaround is the use of custom prefixes for individual terms in SPARQL queries. For example: PREFIX has_part: <http://purl.obolibrary.org/obo/BFO_0000051> and then we can write, e.g.: ?s has_part ?p . instead of: ?s obo:BFO_0000051 ?p . in our query.

This solution works, but it adds more work for query writers because every new term used in a query means looking up and adding a new prefix. This burden can be partially mitigated by keeping a file with common prefixes handy to be copy and pasted into queries, but maintaining such a file is a burden in itself. Furthermore, in cases where a query (without individual term prefixes) is already 100+ lines, implementing the prefix solution only increases the length of the query and complexity of maintaining and troubleshooting it.

rorudn commented 3 years ago

@neilotte The KARMA mapping tool displays term IRIs to the end user in the process of mapping data. I understood @harefb to be saying that this is true also of the OnTop tool. Other applications that I think should be encouraged to facilitate ease of use with opaque IRIs are SPARQL query editors and programming IDEs. BTW, in my opinion you underestimate the difficulty users will experience with SPARQL query editors when using ontologies with opaque IRIs. In my estimate building queries using the usual workarounds when the number of terms enters the hundreds will end in a result that is difficult to comprehend, which hinders sharing and debugging, and which can break length constraints.

neilotte commented 3 years ago

@rorudn There's a comment at the bottom of this thread indicating KARMA can display by rdfs:label. I'm not a regular KARMA user these days so can't verify this myself. Dave Lutz would be a good person to reach out to regarding label rendering in OnTop.

I could be underestimating the difficulty. I'd be interested in a seeing an example of the sort of query that would be difficult to translate. Right now, the SPARQL interface in GraphDB allows for autopopulating prefix statements AND automatically recognizing resources specified within a prefix statement and populating a dropdown within the query interface. This makes for a fairly intuitive interface for query building, even with opaque identifiers.

rorudn commented 3 years ago

Draft Motion: The working group will test and evaluate use of a version of the CCO having opaque identifiers. Testing will be end-user testing of the version in a limited number of applications known to be used by consumers of the CCO. Evaluation will be the generation of a sample of conversion scripts that update files created in the test applications using current or past versions of the CCO to the test version of CCO. Upon completion of the testing and evaluation the working group will consider ("vote on") the adoption of the new version.

rorudn commented 3 years ago

@neilotte I was unaware of both the adoption of labels in KARMA and the capability to use labels in the SPARQL editor of GraphDB. It even allows the choice of language tags. Very cool, thanks.

harefb commented 3 years ago

The ONTOP mapping interface does pull from the IRIs (everything after the “/” or “#”). I do not think you can configure it to render labels in the mapping interface. It is fairly rudimentary since it is open source.

And, of course, there is the fact that it’s nice when you can read the line directly in the .ttl file with a text editor and not require some tool to parse the verbiage to understand the triples. I would like to think that was one of the principles that drove the development of the semantic web in the first place.

All that said, we still support the move to alpha-number designators. The fact that I can understand an IRI by reading the words in it means that I can also get caught up in my own interpretation of what the word means which might be different than the underlying intent of placing the entity in that “spot” in the ontology. Good semantics getting in the way of good semantics, semantically speaking…

V/R Forrest

Forrest B. Hare, PhD, CISSP SAIC Fellow Solution Developer | Cyberspace Operations 571-419-0084 | forrest.b.hare@SAIC.commailto:forrest.b.hare@SAIC.com saic.comhttp://www.saic.com/ |@SAICinchttps://twitter.com/SAICinc SAIC Redefining Ingenuity ™

From: "J. Neil Otte" notifications@github.com Reply-To: CommonCoreOntology/CommonCoreOntologies reply@reply.github.com Date: Monday, March 1, 2021 at 12:31 To: CommonCoreOntology/CommonCoreOntologies CommonCoreOntologies@noreply.github.com Cc: "Hare, Forrest B." Forrest.B.Hare@saic.com, Mention mention@noreply.github.com Subject: Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

EXTERNAL EMAIL -- This message originates from outside of SAIC

@rorudnhttps://github.com/rorudn There's a comment at the bottom of this thread https://github.com/usc-isi-i2/Web-Karma/issues/217 indicating KARMA can display by rdfs:label. I'm not a regular KARMA user these days so can't verify this myself. Dave Lutz would be a good person to reach out to regarding label rendering in OnTop.

I could be underestimating the difficulty. I'd be interested in a seeing an example of the sort of query that would be difficult to translate. Right now, the SPARQL interface in GraphDB allows for autopopulating prefix statements AND automatically recognizing resources specified within a prefix statement and populating a dropdown within the query interface. This makes for a fairly intuitive interface for query building, even with opaque identifiers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-788130361, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLCKT3F3UJKGBGLSTN7X33TBPFOBANCNFSM4XOEOJWQ.

This communication (including any attachments) may contain information that is proprietary, confidential or exempt from disclosure. If you are not the intended recipient, please note that further dissemination, distribution, use or copying of this communication is strictly prohibited. Anyone who received this message in error should notify the sender immediately by telephone or by return email and delete it from his or her computer.

rorudn commented 3 years ago

Revised Draft Motion: We agree, in principle, to convert CCO to opaque identifiers, pending further testing.

mark-jensen commented 3 years ago

I agree with the revised motion.

@neilotte @harefb @bdonohue29 I am sure a branch to start testing is coming soon. Re providing a mapping between old and new IRIs, as users, would you prefer a simple two-column .csv, or something RDF-based, such as a supplemental file containing equivalency axioms or use of an annotation prop on new terms (e.g., CCO_0000001 legacy_term_name "ActOfDating")?

harefb commented 3 years ago

DIA/SAIC team agrees with revised motion. If the terms all still have the English language labels, I think we could just make our own two-column CSV files for the cheat sheets. But a concatenated annotation property might be useful too.

alanruttenberg commented 3 years ago

Some further comments.

Tooling built for people to interact with can be engineered in a way that supports readable labels, alternative or community chosen labels, and the ability to switch among them. Some of the objections to this proposal boil down to "yes, but that takes effort". Indeed it does, but not an inordinate amount. SPARQL queries are often written by hand and so are considered to be a pain point but there are at least two reasonable workarounds. The first @neilotte mentions - add a line to the query to retrieve the IRI based on the label. The other is, as @apcox suggests is use a convention of defining readable prefixes for URIs. An example can be seen in an appendix to a paper about one of my projects.

Such prefixes look like they would be extra work to add, but they don't need to be added manually. Many SPARQL front ends have facilities to autocomplete prefixes and so that can be taken advantage of. I've used (YASGUI)[https://github.com/TriplyDB/Yasgui] and did a prototype at one point that did this, automatically adding the prefix definition when the prefix was autocompleted within a query. IIRC there is a similar facility in GraphDB's query editor. In other case I've written code that constructs SPARQL queries programmatically and in that case the prefixes are added as needed. Similarly, for display of results one bind the labels in the queries or process the results (again within a SPARQL user interface) to automatically replace IRIs in results with hyperlinked labels.

In most other cases there really ought not be exposure of developers to raw IRIs in the first place. In most cases, one shouldn't hand edit RDF as it is too prone to errors. Instead APIs that generate the RDF, of which there are several and in different programming languages, will be preferable. So, I don't think it's nice to be able to read a ttl file. I suspect a rather small minority of the eventual audience for CCO will consider that a benefit. In order to achieve wide adoption tooling will, in any case, need to be developed in a way that is relatively easy for consumers to work with.

As @APCox notes, using labels in software, as opposed to IRIs is still open to issues of labels changing over time. In the ideal case software is written to establish the link between label and IRI at the time of authoring, maintain the IRI as the primary identifier when documents are stored, and dynamically add back labels on display. Here too there are choices. On can always display the same label as was originally used, but with confidence that the correct IRIs will be maintained or allow dynamic choice of label source and display using the most current label from a given source.

Getting in the habit of tooling software to be friendly on entry or to display using labels is just good practice. It yields benefits as soon as one expands to a wider community of users and collaborators. Investments are relatively minor when viewed in a context of an ecosystem that is intended to last, or even compared to the number of person hours that will be spent on CCO's standardization.

Even if CCO keeps using labels in IRIs, tooling will still have to be developed to use opaque IRIs because other ontologies, including BFO, are using them. I've seen too many cases, now, where BFO terms are incorporated without labels and tools that make the assumption that IRIs include labels lands up displaying the opaque IRIs when there is a perfectly good label. With a uniform practice of using opaque IRIs it won't be the case that ontologies that use them are at a disadvantage.

@mark-jenson proposes tooling that creates label style IRIs from the opaque IRIs. I think that's a very bad idea. What we absolutely don't want is to have terms that mean the same thing but with different IRIs.

I support @bdonohue29's suggestion that for the purposes of having a manageable transition, the version number is bumped and the last label IRI version continues to be available, but is not further developed. However, it should be made clear that any work that needs to use a new term only available in the new version, or which depends on interchange of CCO structured data needs to adopt the opaque IRI version. OWL's owl:incompatibleWith can be used to make clear that the two versions are not compatible.

@rorudn's revised draft motion is a good way to start.

See also #108, #109 which also pertain to the form of IRIs.

alanruttenberg commented 3 years ago

During today's MLO meeting today Brian Haugh suggested that another suggestion I made - to use foaf or other external ontology terms in some cases - was inconsistent with my view on opaque IRIs, since, for example, foaf does use natural language in the IRIs.

I don't think these views are inconsistent. The suggestion is that we shift towards always using labels and hiding IRIs. There are several points I make, but one of them is that tooling be built to uniformly use labels. If we use external terms that use natural language and don't have a label, the idea would be to assert a label for them. Doing that means we can have a uniform policy that labels are available should be used in tools.

mark-jensen commented 3 years ago

In near future, we shall be creating a branch with the numeric IRIs for testing. Feedback requested before we do.

Following from OBO, the format is: CCO_0000000-CCO_9999999
Considering the idea for adding meaningful ranges to the assingment of the numeric IDs, e.g., all Information Content Entities get CCO_0000001 - CCO_00005000, etc: The more I think about it, the less enamored I am by the idea. It’s not clear what practical benefit this will actually bring in usage. It leads to bookkeeping overhead, may slow down adding new content if editors have to confirm which range to create new IRIs in, and will require additional validations before release. It’s less extensible than simply randomizing the process, for it could break as the ontologies grow or get refactored, e.g., as IRIs get deprecated. Other thoughts on pros/cons?
CCO has named individuals in the ontologies, e.,g, for measurement units. I assume these too should be made numeric?
Same goes for annotation and data properties, presumably we should replace as with object properties? CCO created its own annotation properties, roughly matching the IAO ones used in BFO, keeping only RDFS label and comment, mostly just because of the convenience of having readable IDs. I noticed BFO2020 uses SKOS for annotations now, which have language-based IDs. @alanruttenberg was that change due to a requirement of the ISO standardization process, or for some other reason? As an extension of BFO and upcoming standard, what do you recommend for CCO? Should we switch to use of SKOS rather than numericise our annotation props?

nklsbckmnn commented 3 years ago

I agree with your view on meaningful ranges. It seems to me to make more sense to assign ranges to editors that they can then use for auto-generation.

An easy way to convert could be hash(IRI) % (10 ** 7) in Python (although there might be collisions).

neilotte commented 3 years ago

@mark-jensen

I suggest using hyphens in the local IDs rather than underscores. e.g. CCO-0000000, rather than CCO_0000000. This is a common best practice for URLs.
Recommend maintaining a registry in your local dev environment where you can reserve a URI CCO-0000000-CCO-9999999. This would mean if every time you want a new one, you'd grab it from the registry, and now you can use it and no one else. This would preclude the need to maintain different ranges for different domains, which could get messy.
Maybe one structure for classes and properties and a different prefix for individuals? CCI-0000000?
Definitely onboard with using skos, dcterms, IAO, and other annotation properties wherever appropriate.

alanruttenberg commented 3 years ago

The rationale for using dash instead of hyphen is that hyphens are parsed as spaces by search engines but underscores not. That works well for search for cases where the separated things are meaningful on their own, but in this case they are not. Probably not desirable to return pages with the term CCO in one place and 0000000 in another place if you search for CCO-0000000.
Registry is nice but it has to be engineered to make sure there aren't race conditions that might result in cases where two people get the same id. It would be nice if this was something built in to protege. People sometimes use GUIDs because then you can allocate, without coordination, ones that are highly improbable to collide. In OBO, IIRC, the ranges were allocated to groups maintaining different parts of the ontology. In that case it is easier to coordinate on who gets an id. Grouping by upper level type might make coordination harder.
Numeric yes. Neutral on different prefixes. There's actually an argument that we shouldn't use meaningful prefixes because it results in a social cost if terms need to be moved to other ontologies where they can be better maintained. It's not rational but people definitely get attached to the idea that all the terms in an ontology have the same prefix. I think that's why the handle system used numeric prefixes.
+1

mark-jensen commented 3 years ago

@alanruttenberg @neilotte @bdonohue29 @eliasweatherfield @harefb

A version of CCO using numerics is now available for testing here

There is a mapping file here

We stuck with underscores and made no meaningful ranges of IDs to separate entities. One thing that did come up in discussion was the idea of making properties with inverses have IDs in sequence, which seems to be fairly common in OBO-land. I can see some benefit to grouping them like that, but only a small one to users that routinely interact with IRIs in certain ways.

Please follow back with any ideas for revision, or concerns, potential problems and so forth.

BrianHaugh commented 2 years ago

I find the proposal to use numeric URIs in CCO unreasonable for the following reasons

The rationale for using numeric URIs is completely undercut in OWL ontologies by the use of rdfs:label to provide a standard label for all elements of an ontology. Use of rdfs:label has the effect of establishing a common name for each element. Thus, if non-English language users have an objection to an English language IRI, they would have the same objection to a standard English language label. Using numeric URIs that are difficult to comprehend just moves the standard name issue to rdfs:label.
It is useless in OWL to try and have multiple alternative values for rdfs:label since it would be difficult for any tools to distinguish between them and identify whatever preferred label any particular community would like to display to users.
Communities that want to display different labels do well to use different label annotation properties for their community so that they are readily distinguished and easy to select for display. Protege, in particular, provides mechanisms for identifying preferred label annotation properties for display in Protege.
When resolving errors or warnings from reasoners, it is frequently necessary to use a text editor to find the source of the problem in the ontology file. But, it is very difficult to recognize classes and properties that are using purely numberic URIs when using a text editor. Having an English language term that is readily recognizable greatly facilitates handling such issues in a text editor.
There are also other tools that do not display labels in their user interface. Some are cited above. I have encountered this in a Natural Language Processing tool, which reported NLP extractions from text using the URIs from ontologies. These were difficult to follow when BFO classes were involved.

bdonohue29 commented 2 years ago

@BrianHaugh I find your arguments rather unpersuasive. To address each in turn:

"Use of rdfs:label has the effect of establishing a common name for each element." If by "common" you mean "standardized," this is simply false. The rdfs:label property may be used in this way to try to establish a globally shared linguistic term (as in a glossary, lexicon, etc), but it needn't be, shouldn't be, and in reality, can't be, because there are no such thing as globally standardized terms. (Thanks a lot, Tower of Babel.) Practically speaking, an ontologist must select a default term for something, but this is by no means the best, clearest, or exclusive way to refer to a class or property in ordinary human language or even in technical jargon. This is okay, because the terminology is not the means by which we align disparate data. The URI is. The role of terminology is just to help a relatively restricted set of users who need to be able to interpret the intent of the ontology accurately. That's all.
"It is useless in OWL to try and have multiple alternative values for rdfs:label since it would be difficult for any tools to distinguish between them and identify whatever preferred label any particular community would like to display to users." In argument 3, you mention alternative annotation properties. That's one viable way to do it. Even in Protege, you can specify which annotation property you want to use to render as the label. Another is to annotate the annotation (e.g., with the source or as terminologically preferred by a particular community). This is easy to add in Protege, and trivial for a tool to query. If a tool cannot do this simple operation, it is a deficient tool.
No disagreement here: a community certainly can use a custom label annotation property. They could also do something like: define them as sub-properties of rdfs:label (or some other generic label annotation property) to allow an inference-based query of all labels. But this isn't an argument against opaque numeric IRIs. Protege, among other tools, tolerates different "views" on the same underlying ontology. That's what numeric IRIs are trying to promote as well: a shared representation of reality amid inescapably diverse conceptualizations and terminologies used to describe reality.
Are annotations not queryable in a text editor?
My advice would be to use better tools.

And additionally, you don't provide any arguments against the many benefits from using neutral numeric IRIs, e.g., that in the social world terminological preferences change rapidly over time, but IRIs never should.

APCox commented 2 years ago

@BrianHaugh I address your latest comments in order below. Please see my March 1 comment above for more of my thoughts on this matter.

You claim that the semantics provided by the values of rdfs:label undercut the use of alphanumeric IRIs. This simply isn't true. Given that ontologies are semantic representations by nature and the label annotation is specifically designed to accommodate these semantics, a human-readable label ought to be included. But it is the IRI -- not the label -- that sets the "common name for each element". Furthermore, as you state in your third point, it is possible to use preferred labels or other custom annotation properties to capture terminological differences for the same type of entity. Ultimately, the label together with the textual definition, logical definition, and other annotations is what determines the sort of entity represented by that IRI. The label is a small (albeit highly visible) part of the semantics. Finally, as I mentioned in my comment above, you can in fact include and leverage multiple values for rdfs:label by using language tags for each value. There are examples of this publicly available on Bioportal and the CCO has been extended in this way by some users.
Including multiple values for rdfs:label will only be problematic (though I wouldn't say "useless") if care isn't taken in how they are handled. See my response to your first point.
Yes. This is, in fact, the point that the pro-alphanumeric IRI side has been making all along and is the solution to the above 2 issues.
Working with "raw" RDF in any number of scenarios is still my biggest complaint against switching from meaningful IRIs. I grant that it is often a pain due to the extra effort and lack of automated tools to make sense of the data; however, the question is whether this is sufficient reason to forego the benefits of switching to alphanumeric IRIs. The general consensus is that it is not. Hopefully, the community can develop simple effective tools to alleviate this burden.
The fact that many tools currently lack or only provide minimal support for leveraging rdfs:label or other annotation properties is unfortunate and is my other concern about switching to alphanumeric IRIs. However, some tools do exist and there are workarounds for a number of application scenarios. Additionally, by committing to this change, tool developers will be increasingly incentivized to improve existing tools or develop new ones -- and we certainly need more and better tools in this domain. Ultimately, I'm confident that this concern will be resolved in due time.

BrianHaugh commented 2 years ago

Let me elaborate more on my objections to using opaque URIs in response to some of the replies:

"Use of rdfs:label has the effect of establishing a common name for each element. The de facto practice in BFO and many of its derivatives, such as prior versions of the CCO, the Cyber Ontology, and the U.S. Army Operational Environment ontology is to provide a single value for rdfs:label annotation properties, which is used as a "standard name," for the corresponding element. This name is cited in the included definition of the class/property. Citing the names/labels of superclasses in definitions is a recommended practice by BFO. These labels and definitions are parts of the standard (if/when it is made a standard). Hence, it seems appropriate to acknowledge them as "standard" names.
Although one can distinguish different uses of rdfs:label via language tags or other annotation property annotations, that has not been done in those ontologies derived from BFO with which I am familiar (though I understand that some of the OBO foundry ontologies do this). If different language versions are used, do we expect all such variants to be incorporated into future versions of a standard or will the only the "standard" English names and definitions be promulgated in a standard such as the proposed CCO? If any case, the proposed CCO even with opaque URIs does not have multi-lingual labels and definitions. So, such a standard will have "standard" human-comprehensible names and definitions in English, at least in the initial release.
Granted that it is possible to distinguish different labels formulated using rdfs:label by using annotations, such as the language tags. But, not all ontology tools and applications support displaying labels/names based on such tags. And, the language tags will not suffice to distinguish variations in same-language usage among different communities (e.g., different terms used for the same class concept by different armed services).
Different communities are free to add whatever alternative labels (using different annotation properties) that they would like to an ontology, regardless of whether or not it uses opaque URIs. Opaque URIs are not needed to support alternative labels/names for different communities. There is no great benefit to using opaque URIs in this regard. Such opaque URIs are not needed for any practical purpose, but only serve to address the feelings of some communities that might not like a "standard" English language term for concepts that they refer to differently. Some such communities might also object to the widespread use of English in international journals. Should we start using numeric URIs for concepts cited in journal articles - I don't think so :-).
When resolving errors or warnings from OWL reasoners, it is frequently necessary to use a text editor to find the source of the problem in the ontology file. But, it can be very difficult to recognize classes and properties that are using numeric URIs when using a text editor. Having an English language term that is readily recognizable greatly facilitates developer's recognition of the content of these files when resolving errors.
No text editors that I know of will automatically replace URIs with labels and then back again when you save the file :-). It would not really be a text editor if it did that. There is often a need for human developers to be able to read the OWL files in their native format (e.g., RDF/XML or Turtle) in order to find errors and correct them.
There are also other tools that do not display labels in their user interface. One may have limited or no choice in what tools are used in applications of ontologies. Some projects specify the use of certain tools and some applications are specified as parts of programs which developers have to use. I cited one NLP software tool, which was part of an information extraction project using an ontology based on BFO. It displayed the URIs in its interface, with no option to display labels. Developers and reviewers had no other option but to view the BFO numeric identifiers in this case. Not so bad with the limited number of BFO classes, but would be an incredible pain with all the CCO classes being opaque.
A community that prefers a different language over English would likely take offense at the BFO use of English throughout for labels, definitions, elucidations, editors notes, and "axioms". The OBO Foundry even has a principal that "Labels and synonyms should be written in English". English has already been established as the standard language for BFO and many of its descendants (mid-level/domain ontologies). Just making the URIs opaque does very little to address this bias in BFO and related ontologies. Nor is there any need to address this "bias" since English has been recognized as the language of choice for international communications (e.g., in professional international journals).
I believe that non-opaque, human-readable, identifiers are most widely used in other ontologies, such as Cyc, SUMO, Dublin Core, and FOAF. The Open Biological and Biomedical Ontologies (OBO) Foundry is the only effort with which I am familiar that has actively promoted the use of such opaque URIs for ontology classes and properties. There is no need to follow their approach, which makes raw ontology files practically illegible to humans.

harefb commented 2 years ago

I will attempt to simplify this issue tremendously.

An international standard should be language agnostic. Imagine if the CCO had been developed by the Ethiopians. Whether it were written in Aramaic, or in alpha-numerics, like BFO, it is still the same effect to me. I, personally, wouldn’t be able to discern the meaning of the script. So I think that is just a requirement that should be a no-brainer for international standards.

Given the above point, I offer the following additional considerations:

I totally agree with Brian that the idea of “just use better tools” to address the challenges it will present for us is very naïve. Trying to deal with the BFO alpha-numerics is painful enough. We are NOT looking forward to having to deal with CCO alpha-numerics as well (but we will if they are international standards). Brian already provided a strong argument showing the difficulties so I won’t repeat. I will just summarize with the fact that the suggested conventions add even more complexity to a field that is already too complex for the average person to absorb. Dealing with that complexity takes resources that ultimately cost our user base time and money (everyone reading this is already expensive). Why extend the winter even longer if we don’t have to?
I think this issue is yet another reason to make the set of terms codified as an international standard as small as possible or at least practical.
For those who are interested, we don’t plan to make DICO an international standard. If there is a term in there that we think should be a standard, we will recommend it to CUBRC to add to CCO to standardize if they want. Therefore, we will maintain English entity names and labels.

Regards, Forrest

From: Brian A Haugh @.> Sent: Wednesday, October 6, 2021 1:00 AM To: CommonCoreOntology/CommonCoreOntologies @.> Cc: Hare, Forrest B. @.>; Mention @.> Subject: Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

EXTERNAL EMAIL -- This message originates from outside of SAIC

Let me elaborate more on my objections to using opaque URIs in response to some of the replies:

"Use of rdfs:label has the effect of establishing a common name for each element. The de facto practice in BFO and many of its derivatives, such as prior versions of the CCO, the Cyber Ontology, and the U.S. Army Operational Environment ontology is to provide a single value for rdfs:label annotation properties, which is used as a "standard name," for the corresponding element. This name is cited in the included definition of the class/property. Citing the names/labels of superclasses in definitions is a recommended practice by BFO. These labels and definitions are parts of the standard (if/when it is made a standard). Hence, it seems appropriate to acknowledge them as "standard" names. Although one can distinguish different uses of rdfs:label via language tags or other annotation property annotations, that has not been done in those ontologies derived from BFO with which I am familiar (though I understand that some of the OBO foundry ontologies do this). If different language versions are used, do we expect all such variants to be incorporated into future versions of a standard or will the only the "standard" English names and definitions be promulgated in a standard such as the proposed CCO? If any case, the proposed CCO even with opaque URIs does not have multi-lingual labels and definitions. So, such a standard will have "standard" human-comprehensible names and definitions in English, at least in the initial release.
Granted that it is possible to distinguish different labels formulated using rdfs:label by using annotations, such as the language tags. But, not all ontology tools and applications support displaying labels/names based on such tags. And, the language tags will not suffice to distinguish variations in same-language usage among different communities (e.g., different terms used for the same class concept by different armed services).
Different communities are free to add whatever alternative labels (using different annotation properties) that they would like to an ontology, regardless of whether or not it uses opaque URIs. Opaque URIs are not needed to support alternative labels/names for different communities. There is no great benefit to using opaque URIs in this regard. Such opaque URIs are not needed for any practical purpose, but only serve to address the feelings of some communities that might not like a "standard" English language term for concepts that they refer to differently. Some such communities might also object to the widespread use of English in international journals. Should we start using numeric URIs for concepts cited in journal articles - I don't think so :-).
When resolving errors or warnings from OWL reasoners, it is frequently necessary to use a text editor to find the source of the problem in the ontology file. But, it can be very difficult to recognize classes and properties that are using numeric URIs when using a text editor. Having an English language term that is readily recognizable greatly facilitates developer's recognition of the content of these files when resolving errors. No text editors that I know of will automatically replace URIs with labels and then back again when you save the file :-). It would not really be a text editor if it did that. There is often a need for human developers to be able to read the OWL files in their native format (e.g., RDF/XML or Turtle) in order to find errors and correct them.
There are also other tools that do not display labels in their user interface. One may have limited or no choice in what tools are used in applications of ontologies. Some projects specify the use of certain tools and some applications are specified as parts of programs which developers have to use. I cited one NLP software tool, which was part of an information extraction project using an ontology based on BFO. It displayed the URIs in its interface, with no option to display labels. Developers and reviewers had no other option but to view the BFO numeric identifiers in this case. Not so bad with the limited number of BFO classes, but would be an incredible pain with all the CCO classes being opaque.
A community that prefers a different language over English would likely take offense at the BFO use of English throughout for labels, definitions, elucidations, editors notes, and "axioms". The OBO Foundry even has a principal that "Labels and synonyms should be written in English". English has already been established as the standard language for BFO and many of its descendants (mid-level/domain ontologies). Just making the URIs opaque does very little to address this bias in BFO and related ontologies. Nor is there any need to address this "bias" since English has been recognized as the language of choice for international communications (e.g., in professional international journals).
I believe that non-opaque, human-readable, identifiers are most widely used in other ontologies, such as Cyc, SUMO, Dublin Core, and FOAF. The Open Biological and Biomedical Ontologies (OBO) Foundry is the only effort with which I am familiar that has actively promoted the use of such opaque URIs for ontology classes and properties. There is no need to follow their approach, which makes raw ontology files practically illegible to humans.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-935457922, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLCKTYCVTN64VS5DU3L2Y3UFPJWFANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

The information contained in this e-mail and any attachments from Science Applications International Corporation ("SAIC") may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

bdonohue29 commented 2 years ago

A few notes to respond to @BrianHaugh 's concerns:

Terms being used in definitions typically reflect the "ontologist-preferred terminology," which needn't be taken as a "standardized name," much less the only name, and certainly not as most user-friendly term for human consumption (e.g. "Ratio Measurement Information Content Entity").
The significance of textual definitions can be misunderstood. A textual definition should express the essence of the entity, but it is still just a linguistic artifact. Accordingly, there can be different valid ways of linguistically expressing the same essence.
Terminological preferences evolve over time, very rapidly. URIs should not. In fact, they should be permanent, immutable. Accordingly, tying URIs to (current) terminological preferences poses the risk of needing to update the URI later, which is always a breaking change.
A URI is not for human consumption. It is for machine consumption.
The fact that other ontology efforts -- e.g., Cyc, SUMO, Dublin Core, FOAF -- use natural language URIs obviously does not tell us that this is in fact best practice. If anything, BFO, CCO, and the OBO Foundry are predicated on the belief that a lot of people are doing this wrong (for example, by conflating ontology and terminology).
There has not yet been much need for providing annotations in different languages. But this discussion is about the right principle of design, not necessarily immediate pragmatic demands.
International journals may tend to employ a common natural language, but they obviously do not enforce uniform terminology. Moreover, machines do not use ontologies the same way humans use actual language to communicate, not even in professional or academic settings. So I don't find the cases all that analogous.

BrianHaugh commented 2 years ago

Responses to comments are interspersed below.

Brian

From: Brian Donohue @.> Sent: Wednesday, October 6, 2021 2:37 PM To: CommonCoreOntology/CommonCoreOntologies @.> Cc: Haugh, Brian A @.>; Mention @.> Subject: [EXT] Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

This email originated outside of IDA. Please verify that you recognize the sender and know the content is safe before proceeding.

A few notes to respond to @BrianHaughhttps://github.com/BrianHaugh 's concerns:

Terms being used in definitions typically reflect the "ontologist-preferred terminology," which needn't be taken as a "standardized name," and certainly not as most user-friendly term for human consumption (e.g. "Ratio Measurement Information Content Entity"). ** Terms in definitions and the definitions themselves provide the informal semantics in an ontology. They are essential to fully conveying the intended concepts used in an ontology to humans.
The significance of textual definitions can be misunderstood. A textual definition should express the essence of the entity, but it is still just a linguistic artifact. Accordingly, there can be different valid ways of linguistically expressing the same essence. ** Of course, different definitions can be equally valid, but they need to capture and convey the concept accurately. When one label and one definition is provided by a standard ontology, it is expected to be an accurate standard term and definition for the corresponding concept.
Terminological preferences evolve over time, very rapidly. URIs should not. In fact, they should be permanent, immutable. Accordingly, tying URIs to (current) terminological preferences poses the risk of needing to update the URI later, which is always a breaking change. ** An ontology with labels and definitions ties the URIs to terminological and definitional preferences. If these are changed in a way that changes the intended meaning of a ontology element, then the URI should change.
A URI is not for human consumption. It is for machine consumption. ** In many contexts (previously cited), humans need to be able to read the identifiers (e.g., URIs) for classes and properties from ontologies. That is why it is best for them to be human-readable. Otherwise, their intended meaning is “opaque” when viewed in isolation.
The fact that other ontology efforts -- e.g., Cyc, SUMO, Dublin Core, FOAF -- use natural language URIs obviously does not tell us that this is in fact best practice. If anything, BFO, CCO, and the OBO Foundry are predicated on the belief that a lot of people are doing this wrong (for example, by conflating ontology and terminology). ** Terminology and human-readable definitions are essential components of any good ontology. BFO and the OBO Foundry establish preferred terms and definitions for its concepts, even requiring them to be in English. CCO and other derivatives of BFO have been used for years without opaque identifiers. It has been argued that the opaque identifier bias of the OBO Foundry is not adequately justified. Other ontologies recognize the importance of human comprehension of ontology element identifiers.
There has not yet been much need for providing annotations in different languages. But this discussion is about the right principle of design, not necessarily immediate pragmatic demands. ** There has been no persuasive argument that opaque identifiers are the best principle for design. Pragmatic factors need to be taken into account in any good design.
International journals may tend to employ a common natural language, but they obviously do not enforce uniform terminology. Moreover, machines do not use ontologies the same way humans use actual language to communicate, not even in professional or academic settings. So I don't find the cases all that analogous.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-936870862, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5A6NLD54A54R2NK4DS3ATUFSJLJANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

alanruttenberg commented 2 years ago

[Previous version of comment was not finished. I accidentally hit the update button]

You have it exactly backwards if you think that what is being proposed here ignores the importance of human comprehension. The approach was designed specifically with the goal of enhancing human comprehension. The idea that there should be one label, in one language, for all communities, is ignoring the importance of communication by not recognizing the realities about how people communicate. Many words have multiple senses. Many classes are named differently in different disciplines. That is the fact of the matter.

About tools: We are attempting to build something that will be used in the long term. The tools we have now will be superseded over time. Should this take off within the DoD/IC community there will be acquisition and development of software where we set the standards.

Tools are supposed to work for us, not the other way around. I am sympathetic to your concerns but they are relatively short term and there are workarounds. I remain of the opinion that tools that ignore labels are deficient. If it was intended that the best practice was to use readable IRIs, why bother having rdfs:label included in the RDFS specification. Without a doubt, Protege has been used more than any other tool for building OWL ontologies. It's implementation, based on experience that is unmatched, displays of terms based on priority-ordered list of annotation properties. That is the architecture that other tools should be using as well.

If it seems as if there is no choice but to use certain tools then a pragmatic approach would be to use a script translate the numeric IRIs to english IRIs. Use the reverse translation before sharing outside your environment. This will not be difficult to write and perhaps we can have it built and supported by the Foundry.

That some of the early and small ontologies such as FOAF and DC used terminology was IMO damaging because it encouraged the idea that it made sense to edit ontologies and semweb data as text. It may seem simpler, but it doesn't scale. The most frequent errors I see are introduced by editing ontologies by hand instead of building them syntactically correct by construction, as one does in Protege. That SUMO and CYC use terminology in IRIs is not a signal that this is good practice. BFO and OBO ontologies represent a much larger body of ontology work and based on that experience have chosen to use opaque IRIs.

Finally, I would point out that some of us work with partners worldwide. Have a look at the EU's policy on language. I expect that as the scope of use increases there will be demand for translation. For many people in the world, looking at an English word is like us looking at an opaque identifer.

alanruttenberg commented 2 years ago

@BrianHaugh, to your point about journals, some journals do use opaque identifiers in exactly the way intended. The flat version of the paper uses language. But, as is pointed out, not all authors use the same language. The XML or other marked-up versions use the unique identifiers. Other papers use tables where they give the identifiers for the terms they use, usually as CURIEs, for brevity. When IRIs are resolvable, as is considered best practice, those identifiers effectively become part of the bibliography, with the linked-to pages providing more information about the terms.

The standard for identifying authors in research papers is to use an ORCID which is opaque. The standard for identifying papers is to use a DOI, which is opaque.

rorudn commented 2 years ago

I propose that the CCO will adopt what I'll call translucent identifier solution. This amounts to keeping the class names as they are for now, and when needed change them using a numeric index. So cco:Facility changes to cco:Facility1

This proposal is in large measure due to what I see as evidence that changing over to opaque identifiers will slow the uptake of the CCO in communities that provide the financial support for the development and maintenance of the CCO. For anyone who would disapprove of such a pragmatic reason I would respond by saying that its always easier to bet when you're using someone else's money.

So how do I see this proposal addressing the reasons why this issue was raised in the first place?

Alan started the issue by citing three reasons...

"First, by using interpretable labels you potentially alienate or confuse users in different communities where terms are known by different names." -- that is why labels are to be used whether classes are designated using human-readable, translucent or opaque ids. This is addressed through adequate documentation as to where to look for the community name.

"Second, we want our ontologies to be used worldwide, and using english in IRIs is not welcoming to non-english speakers. The sanctioned mechanism for providing user readable labels is to use rdfs:label or skos properties, and literals with language tags." --Of course, we want our ontologies to be used worldwide and we are very appreciative of those users from around the world that have starred this repo but changing the class id to an opaque id is of limited value to this end when compared to providing a label for the term in another language. Let's place our efforts there.

"Third, there will inevitably be cases where words are spelled wrong, or disputed, which makes for pressure to "fix" the IRIs. Unfortunately, such fixes are typically breaking changes to users." I'll address this in two cases, one in which the extension of the class doesn't change and the second in which the it does. The former would be a case such as when we've used "cco:Facility" for the name of the class but have been informed by our user community that the class as described in the definition is better named as "cco:Building". The argument in favor of opaque identifiers seems to be that if we had identified the class as "cco:0000052" the community debate would have centered around the label of the class instead and we could either change the label or add another one specifically for that community segment. But the same maneuver is available to us now, again it comes down to training and documentation, "Use the label not the class identifier". In the case where the extension of the class changes (e.g. the class identified by "Agent" is extended from including persons and organizations to also including robots), then yes, this is a breaking change. But it's also a breaking change for a class designated by an opaque identifier. So under the current proposal, the class id would change from "cco:Agent" to "cco:Agent1" compared to "cco:0000052" to "cco:0000085" if using opaque identifiers.

This proposal does have precedent, email addresses and twitter handles to name two.

harefb commented 2 years ago

LOL. I’m afraid the people who are betting with other people’s money will still not be content with your solution as they comfortably read the proposal on other people’s time.

But I’m rooting for you!

From: rorudn @.> Sent: Thursday, October 7, 2021 9:23 AM To: CommonCoreOntology/CommonCoreOntologies @.> Cc: Hare, Forrest B. @.>; Mention @.> Subject: Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

EXTERNAL EMAIL -- This message originates from outside of SAIC

I propose that the CCO will adopt what I'll call translucent identifier solution. This amounts to keeping the class names as they are for now, and when needed change them using a numeric index. So cco:Facility changes to cco:Facility1

This proposal is in large measure due to what I see as evidence that changing over to opaque identifiers will slow the uptake of the CCO in communities that provide the financial support for the development and maintenance of the CCO. For anyone who would disapprove of such a pragmatic reason I would respond by saying that its always easier to bet when you're using someone else's money.

So how do I see this proposal addressing the reasons why this issue was raised in the first place?

Alan started the issue by citing three reasons...

"First, by using interpretable labels you potentially alienate or confuse users in different communities where terms are known by different names." -- that is why labels are to be used whether classes are designated using human-readable, translucent or opaque ids. This is addressed through adequate documentation as to where to look for the community name.

"Second, we want our ontologies to be used worldwide, and using english in IRIs is not welcoming to non-english speakers. The sanctioned mechanism for providing user readable labels is to use rdfs:label or skos properties, and literals with language tags." --Of course, we want our ontologies to be used worldwide and we are very appreciative of those users from around the world that have starred this repo but changing the class id to an opaque id is of limited value to this end when compared to providing a label for the term in another language. Let's place our efforts there.

"Third, there will inevitably be cases where words are spelled wrong, or disputed, which makes for pressure to "fix" the IRIs. Unfortunately, such fixes are typically breaking changes to users." I'll address this in two cases, one in which the extension of the class doesn't change and the second in which the it does. The former would be a case such as when we've used "cco:Facility" for the name of the class but have been informed by our user community that the class as described in the definition is better named as "cco:Building". The argument in favor of opaque identifiers seems to be that if we had identified the class as "cco:0000052" the community debate would have centered around the label of the class instead and we could either change the label or add another one specifically for that community segment. But the same maneuver is available to us now, again it comes down to training and documentation, "Use the label not the class identifier". In the case where the extension of the class changes (e.g. the class identified by "Agent" is extended from including persons and organizations to also including robots), then yes, this is a breaking change. But it's also a breaking change for a class designated by an opaque identifier. So under the current proposal, the class id would change from "cco:Agent" to "cco:Agent1" compared to "cco:0000052" to "cco:0000085" if using opaque identifiers.

This proposal does have precedent, email addresses and twitter handles to name two.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-937788385, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLCKT2MWE4JG6JPIBEREFTUFWNK7ANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

The information contained in this e-mail and any attachments from Science Applications International Corporation ("SAIC") may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

BrianHaugh commented 2 years ago

No one has proposed limiting possible labels or definitions in different languages or for different communities. It is a red herring to suggest that we need to use opaque identifiers in order to facilitate alternative labels and definitions.

Tools like Protege use reasoners that identify problems in the source files that can require viewing and or editing the source files to resolve the problems.

We cannot expect all applications to provide a convenient labeled based interface to content using ontologies.

Brian

From: Alan Ruttenberg @.> Date: Thu, Oct 7, 2021, 1:23 AM To: CommonCoreOntology/CommonCoreOntologies @.> CC: "Haugh, Brian A" @.>, Mention @.> Subject: [EXT] Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

This email originated outside of IDA. Please verify that you recognize the sender and know the content is safe before proceeding.

You have it exactly backwards if you think that what is being proposed here ignores the importance of human comprehension. The approach was designed specifically with the goal of enhancing human comprehension. The idea that there should be one label, in one language, for all communities, is ignoring the importance of communication by not recognizing the realities about how people communicate. Many words have multiple senses. Many classes are named differently in different disciplines. That is the fact of the matter.

About tools: We are attempting to build something that will be used in the long term. The tools we have now will be superseded over time. Should this take off within the DoD/IC community there will be acquisition and development of software where we set the standards.

Tools are supposed to work for us, not the other way around. I am sympathetic to your short term concerns but they are relatively short term and there are workarounds. That a focus on using labels in the way we have suggested is I remain of the opinion that tools that ignore labels are deficient. If it was intended that the best practice was to use readable IRIs, why bother having rdfs:labelhttps://www.w3.org/TR/rdf-schema/#ch_label included in the RDFS specification. Without a doubt, Protege has been used more than any other tool for building OWL ontologies. It's implementation, based on experience that is unmatched, displays of terms based on priority-ordered list of annotation properties. That is the architecture that other tools should be using as well.

If it seems as if there is no choice but to use certain tools then then the pragmatic approach would be to use a script translate the numeric IRIs to english IRIs. Use the reverse translation before sharing outside your environment. This will not be difficult to write and perhaps we can have it built and supported by the Foundry.

That some of the early and small ontologies such as FOAF and DC used terminology was IMO damaging because it encouraged the idea that it made sense to edit ontologies and semweb data as text. It may seem simpler, but it doesn't scale. The most frequent errors I see are introduced by editing ontologies by hand instead of building them syntactically correct by construction, as one does in Protege. That SUMO and CYC use terminology in IRIs is not a signal that this is good practice. BFO and OBO ontologies represent a much larger body of ontology work and based on that experience have chosen to use opaque IRIs.

Finally, I would point out that some of us work with partners worldwide. Have a look at the EU's policy on languagehttps://en.wikipedia.org/wiki/Languages_of_the_European_Union#European_Commission. I expect that as the scope of use increases there will be demand for translation. For many people in the world, looking at an English word is like us looking at an opaque identifer.

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-937460784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5A6NIVZXGHBJTCLQDT5TTUFUVFRANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

BrianHaugh commented 2 years ago

Let me add, that the intelligence Community is not the only one already using the Common Core ontologies with their human-readable identifiers. Other ontology derived from the Common Core include the Army's operational environment ontology and the ontology for attacks in cyber risk assessments. It would be a substantial disruption to existing applications using the Common Core ontology 2 have to change all the uri's to ones that are not human readable.

Brian

From: harefb @.> Date: Wed, Oct 6, 2021, 11:11 AM To: CommonCoreOntology/CommonCoreOntologies @.> CC: "Haugh, Brian A" @.>, Mention @.> Subject: [EXT] Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

This email originated outside of IDA. Please verify that you recognize the sender and know the content is safe before proceeding.

I will attempt to simplify this issue tremendously.

An international standard should be language agnostic. Imagine if the CCO had been developed by the Ethiopians. Whether it were written in Aramaic, or in alpha-numerics, like BFO, it is still the same effect to me. I, personally, wouldn’t be able to discern the meaning of the script. So I think that is just a requirement that should be a no-brainer for international standards.

Given the above point, I offer the following additional considerations:

I totally agree with Brian that the idea of “just use better tools” to address the challenges it will present for us is very naïve. Trying to deal with the BFO alpha-numerics is painful enough. We are NOT looking forward to having to deal with CCO alpha-numerics as well (but we will if they are international standards). Brian already provided a strong argument showing the difficulties so I won’t repeat. I will just summarize with the fact that the suggested conventions add even more complexity to a field that is already too complex for the average person to absorb. Dealing with that complexity takes resources that ultimately cost our user base time and money (everyone reading this is already expensive). Why extend the winter even longer if we don’t have to?
I think this issue is yet another reason to make the set of terms codified as an international standard as small as possible or at least practical.
For those who are interested, we don’t plan to make DICO an international standard. If there is a term in there that we think should be a standard, we will recommend it to CUBRC to add to CCO to standardize if they want. Therefore, we will maintain English entity names and labels.

Regards, Forrest

From: Brian A Haugh @.> Sent: Wednesday, October 6, 2021 1:00 AM To: CommonCoreOntology/CommonCoreOntologies @.> Cc: Hare, Forrest B. @.>; Mention @.> Subject: Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

EXTERNAL EMAIL -- This message originates from outside of SAIC

Let me elaborate more on my objections to using opaque URIs in response to some of the replies:

"Use of rdfs:label has the effect of establishing a common name for each element. The de facto practice in BFO and many of its derivatives, such as prior versions of the CCO, the Cyber Ontology, and the U.S. Army Operational Environment ontology is to provide a single value for rdfs:label annotation properties, which is used as a "standard name," for the corresponding element. This name is cited in the included definition of the class/property. Citing the names/labels of superclasses in definitions is a recommended practice by BFO. These labels and definitions are parts of the standard (if/when it is made a standard). Hence, it seems appropriate to acknowledge them as "standard" names. Although one can distinguish different uses of rdfs:label via language tags or other annotation property annotations, that has not been done in those ontologies derived from BFO with which I am familiar (though I understand that some of the OBO foundry ontologies do this). If different language versions are used, do we expect all such variants to be incorporated into future versions of a standard or will the only the "standard" English names and definitions be promulgated in a standard such as the proposed CCO? If any case, the proposed CCO even with opaque URIs does not have multi-lingual labels and definitions. So, such a standard will have "standard" human-comprehensible names and definitions in English, at least in the initial release.
Granted that it is possible to distinguish different labels formulated using rdfs:label by using annotations, such as the language tags. But, not all ontology tools and applications support displaying labels/names based on such tags. And, the language tags will not suffice to distinguish variations in same-language usage among different communities (e.g., different terms used for the same class concept by different armed services).
Different communities are free to add whatever alternative labels (using different annotation properties) that they would like to an ontology, regardless of whether or not it uses opaque URIs. Opaque URIs are not needed to support alternative labels/names for different communities. There is no great benefit to using opaque URIs in this regard. Such opaque URIs are not needed for any practical purpose, but only serve to address the feelings of some communities that might not like a "standard" English language term for concepts that they refer to differently. Some such communities might also object to the widespread use of English in international journals. Should we start using numeric URIs for concepts cited in journal articles - I don't think so :-).
When resolving errors or warnings from OWL reasoners, it is frequently necessary to use a text editor to find the source of the problem in the ontology file. But, it can be very difficult to recognize classes and properties that are using numeric URIs when using a text editor. Having an English language term that is readily recognizable greatly facilitates developer's recognition of the content of these files when resolving errors. No text editors that I know of will automatically replace URIs with labels and then back again when you save the file :-). It would not really be a text editor if it did that. There is often a need for human developers to be able to read the OWL files in their native format (e.g., RDF/XML or Turtle) in order to find errors and correct them.
There are also other tools that do not display labels in their user interface. One may have limited or no choice in what tools are used in applications of ontologies. Some projects specify the use of certain tools and some applications are specified as parts of programs which developers have to use. I cited one NLP software tool, which was part of an information extraction project using an ontology based on BFO. It displayed the URIs in its interface, with no option to display labels. Developers and reviewers had no other option but to view the BFO numeric identifiers in this case. Not so bad with the limited number of BFO classes, but would be an incredible pain with all the CCO classes being opaque.
A community that prefers a different language over English would likely take offense at the BFO use of English throughout for labels, definitions, elucidations, editors notes, and "axioms". The OBO Foundry even has a principal that "Labels and synonyms should be written in English". English has already been established as the standard language for BFO and many of its descendants (mid-level/domain ontologies). Just making the URIs opaque does very little to address this bias in BFO and related ontologies. Nor is there any need to address this "bias" since English has been recognized as the language of choice for international communications (e.g., in professional international journals).
I believe that non-opaque, human-readable, identifiers are most widely used in other ontologies, such as Cyc, SUMO, Dublin Core, and FOAF. The Open Biological and Biomedical Ontologies (OBO) Foundry is the only effort with which I am familiar that has actively promoted the use of such opaque URIs for ontology classes and properties. There is no need to follow their approach, which makes raw ontology files practically illegible to humans.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-935457922, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ARLCKTYCVTN64VS5DU3L2Y3UFPJWFANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

The information contained in this e-mail and any attachments from Science Applications International Corporation ("SAIC") may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-936466230, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5A6NNSPEHTJ3YFNWHMWI3UFRRIFANCNFSM4XOEOJWQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

APCox commented 2 years ago

Let me add, that the intelligence Community is not the only one already using the Common Core ontologies with their human-readable identifiers. Other ontology derived from the Common Core include the Army's operational environment ontology and the ontology for attacks in cyber risk assessments. It would be a substantial disruption to existing applications using the Common Core ontology 2 have to change all the uri's to ones that are not human readable. Brian

This is why a translation document (and perhaps also an accompanying script) will be provided to make the transition easier for users. Perhaps this should be continued beyond the initial implementation period so users who don't want to use opaque IRIs don't have to. This solution will shift the burden from those users onto the CCO development team to essentially maintain 2 versions of the CCO, although only the opaque IRI version will be the official standard.

alanruttenberg commented 2 years ago

@rorudn Having information on how to find the labels still allows tools to do the wrong thing - only display by IRI. It means that even if there is a preferred term for a community or a language, the tools that are broken in that way they are now will continue to be broken because they will still ignore the labels. The "transparent" IRI proposal e.g. of having facility and facility1 in IRI, manages to bring together the worst aspects of the two proposals. Now there are both meaningless numbers AND terminology in IRIs.

Here's a common scenario - it just happened the other day browsing Collibra. You get a pick list of terms with multiple choices spelled the same way and with no easy way to figure out which is which. If tools support flexible labels at lease the user has a chance to fix this. Think of the word System. Any sufficiently large collection of ontologies will have several terms with that label but with differing definitions. Drawing a graph, you might see multiple nodes labeled "system" but not know is which. However, within a smaller community the word "system" would be understood to because it isn't used ambiguously and labels could be added to the other "system" terms that are also unambiguous within that community.

Or, if we adopt having, as a common practice an editor preferred label, enforced to be unique across the foundry, then at least we could switch to that set of labels if the need arose.

swartik commented 2 years ago

Let me add, that the intelligence Community is not the only one already using the Common Core ontologies with their human-readable identifiers. Other ontology derived from the Common Core include the Army's operational environment ontology and the ontology for attacks in cyber risk assessments. It would be a substantial disruption to existing applications using the Common Core ontology 2 have to change all the uri's to ones that are not human readable.

There are arguments to be made in favor of natural-language IRIs, but I find this one problematic. Any time a major version of an ontology is introduced, applications using that ontology must be thoroughly reviewed to ensure they aren't relying on out-of-date semantics. Witness @mark-jensen's proposed changes to stasis and change. I expect organizations will find switching to opaque IRIs largely mechanical. Semantic reviews are not.

alanruttenberg commented 2 years ago

@apcox I don't think it would be much of a burden. All the information needed to make the translation is in the ontology. A Sparql query returns pairs of IRI and whichever label is desired, and then the labels (possibly mangled to avoid restricted characters in IRIs) are used to construct the terminology-based IRI. These could even be kept as value of an extra annotation property. Ideally a script would verify that each terminology-based IRI is used only once.

alanruttenberg commented 2 years ago

@swartik If a change in meaning happens the old terms and IRIs are kept as deprecated and new IRIs are minted for the new one. That's so users don't land up with dangling pointers. Making a change such as the one you cite may or may not change the meaning. Some changes are fixes that bring the term closer in line to what the author meant. I haven't reviewed the @mark-jensen's proposal so I can't speak to which sort of change it would be. The point of doing things this way is that one doesn't have to re-examine old terms. One only has to review newly deprecated and added terms.

alanruttenberg commented 2 years ago

@BrianHaugh Would if be possible to share the list of tools you are using that can't display labels, and to share some of the errors you have had that expose bare IRIs along with a rough idea of how frequently such errors occur?

Thanks, Alan

jonathanvajda commented 2 years ago

If I could add to the list of reasons for opaque IDs:

As Alan says above, human-readable terms frequently don't enhance human comprehension without pragmatically assuming (legitimately or illegitimately) that no other senses could be meant. CCO:Agent, for example, could refer to a chemical agent, a special agent, or a conscious being with a capability for intentional acts, or an aggregate of conscious beings with a capability for intentional joint acts. It is unfortunate that such a term is used when it has so many senses. Human readers cannot know what is meant without looking at the hierarchy or definition before comprehending what class is in view. It isn't even known simply by knowing that it is in a mid-level ontology.
Opaque-IDs can have labels with multiple language designators. It isn't hard to ensure you have software that displays the label in your preferred language. This is similar to what some have said above.
A community may want to replace a label to be more specific or because language conventions have changed, but keep the class otherwise unchanged. This seems preferable to deprecating an otherwise unproblematic term.
Translucent IDs can inherit the problems I've raised above.
Deprecating terms is infinitely more practical with opaque-IDs. Translucent IDs don't solve this either.

jimschoening1 commented 1 month ago

I now have a stake in this decision, so I plan to make the motion: 'P3195.1 Common Core Ontologies and all P3195.1.X extension ontologies convert to opaque identifiers using the namespace https://purl.ieee.org/sa/cco/.'

(Note: I recuse myself [on this topic only] and ask Cameron More to act as OSWG Chair.)

I believe this means: CCO IRI http://www.ontologyrepository.com/CommonCoreOntologies/ would be replaced by the PURL https://purl.ieee.org/sa/cco/, which would point to that class in the latest version of CCO at https://opensource.ieee.org/cco/CommonCoreOntologies/

I believe this deprecates use of http://www.ontologyrepository.com, which I don't see we need anymore.

Here's my stake: I was recently appointed Co-Chair of Credential Schemas Work Item under Decentralized Identifier Foundation (DIF), which has reached consensus to include CCO+domain PURLs in their standard schemas for Verifiable Credentials, but we need real PURLs that should never change. This holds great opportunity for CCO and domain ontologies gaining real world adoption in the rapidly emerging field of Verifiable Credentials (VC), which DIF is a leading player in. Early adoption in this field will also provide our ontologies with bottom-up requirements plus validation of our draft content, which are essential if we ever want to pass any standards.

BrianHaugh commented 1 month ago

I have strong interests in this decision since

I find it very difficult to understand what the problems are when debugging errors from an inconsistent ontology if the identifiers are not understandable, e.g., “opaque”. Inconsistencies can readily arise when merging new ontology files with others or just in the course of ontology development.
It is also very difficult to follow what has changed between versions of an ontology when examining a diff between them in which some/many of the identifiers are opaque. Diffs are commonly used to identify ontology updates in GitHub/GitLab.
Use of opaque identifiers violates the DoD Data Strategy key goal to “make data understandable.”

It is also quite feasible to have stable understandable identifiers, even though labels may change.

For more details on my position, see the attached file.

I regret that I will probably be unable to participate in this planned discussion, assuming that it is held during the OSWG meeting this week since I will be on vacation in Costa Rica.

Brian

Brian A. Haugh, Ph.D. Office Phone: 703-845-6678 Institute for Defense Analyses Internet: @.**@.> 730 East Glebe Road NIPRNet: @.**@.> Alexandria, VA 22305 SIPRNet: @.**@.>

From: James Schoening @.> Sent: Monday, July 22, 2024 3:20 PM To: CommonCoreOntology/CommonCoreOntologies @.> Cc: Haugh, Brian A @.>; Mention @.> Subject: [EXT] Re: [CommonCoreOntology/CommonCoreOntologies] In IRIs, use opaque identifiers instead of english labels (#105)

This email originated outside of IDA. Please verify that you recognize the sender and know the content is safe before proceeding.

I now have a stake in this decision, so I plan make the motion: 'P3195.1 Common Core Ontologies and all P3195.1.X extension ontologies convert to opaque identifiers using the namespace https://purl.ieee.org/sa/cco/.'

(Note: I recuse myself [on this topic only] and ask Cameron More to act as OSWG Chair.)

I believe this means: CCO IRI http://www.ontologyrepository.com/CommonCoreOntologies/ would be replaced by the PURL https://purl.ieee.org/sa/cco/, which would point to that class in the latest version of CCO at https://opensource.ieee.org/cco/CommonCoreOntologies/

I believe this deprecates use of http://www.ontologyrepository.com, which I don't see we need anymore.

Here's my stake: I was recently appointed Co-Chair of Credential Schemas Work Item under Decentralized Identifier Foundation (DIF), which has reached consensus to include CCO+domain PURLs in their standard schemas for Verifiable Credentials, but we need real PURLs that should never change. This holds great opportunity for CCO and domain ontologies gaining real world adoption in the rapidly emerging field of Verifiable Credentials (VC), which DIF is a leading player in. Early adoption in this field will also provide our ontologies with bottom-up requirements plus validation of our draft content, which are essential if we ever want to pass any standards.

— Reply to this email directly, view it on GitHubhttps://github.com/CommonCoreOntology/CommonCoreOntologies/issues/105#issuecomment-2243651644, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB5A6NLHYUFTH4I6C5COQFTZNVLOJAVCNFSM6AAAAABLI5TV2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBTGY2TCNRUGQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

alanruttenberg commented 1 month ago

@jimschoening1 The reason to keep www.ontologyrepository.com (or some other domain that is controlled by the developer community) is that while IEEE, like any organization, has the best of intentions stuff happens in organizations and down the road it may come a time where they no longer host the PURL server. By having the ontology developers/steering committee own the domain that eventuality can be handled by standing up an alternative PURL server and directing the domain to that.

This is not a theoretical concern. OCLC, who ran the PURL server OBO initially used, was an equally reputable and trustworthy organization, but due to internal decisions we were not privy to stopped supporting it. The only thing that let our PURLs continue to be resolved was the forethought in deciding to own our own domain name, and our negotiation with OCLC to be able to use it as a CNAME for their PURL server.

Please don't try to reassure me that IEEE would never do this. It's a big complicated organization and our involvement with them is a tiny thing. Shit happens.

The cost of a domain name is negligible and the protection it gives in the face of unforeseeable changes is very valuable.

jonathanvajda commented 1 month ago

@jimschoening1 wrote

I now have a stake in this decision, so I plan to make the motion: 'P3195.1 Common Core Ontologies and all P3195.1.X extension ontologies convert to opaque identifiers using the namespace https://purl.ieee.org/sa/cco/.'

I think how the initial reactions have come in, this might be a good reason to keep these distinct proposals, that, if separately agreed upon in a set time frame, are rolled out at the same time. There is a CCO git dev branch publicly available with the CCO opaque IRIs (and I know of some project that uses it -- at their own risk). https://github.com/CommonCoreOntology/CommonCoreOntologies/tree/numeric-iris

Here's my stake: I was recently appointed Co-Chair of Credential Schemas Work Item under Decentralized Identifier Foundation (DIF), which has reached consensus to include CCO+domain PURLs in their standard schemas for Verifiable Credentials, but we need real PURLs that should never change. This holds great opportunity for CCO and domain ontologies gaining real world adoption in the rapidly emerging field of Verifiable Credentials (VC), which DIF is a leading player in. Early adoption in this field will also provide our ontologies with bottom-up requirements plus validation of our draft content, which are essential if we ever want to pass any standards.

I am strongly in favor of the ends (having a PURL server, dedicated PURLs, adoption in VC and DIF). I have no strong feelings about the means.

@BrianHaugh wrote:

I find it very difficult to understand what the problems are when debugging errors from an inconsistent ontology if the identifiers are not understandable, e.g., “opaque”. Inconsistencies can readily arise when merging new ontology files with others or just in the course of ontology development.

This is a limitation on tooling, not on the IRIs. I think this is an argument in favor of getting better software, where debugging with labels resolving is a functional requirement. For example, whenever I do a quality control check (whether SPARQL queries, RDFLib, or ROBOT, I want all of the IRIs for the antimodel (QC violation) to come with the rdfs:label and the cco:is_curated_in_ontology, where those annotations are available. We need tools that default to display these and other annotations.

I'll be frank on this -- if I can get any money toward software development, it is to help with lessening the impact of design choices between the opaque IRI/human-readable IRI. I take it to be one of the main things we need to solve for our community. Better tooling, not worse IRIs.

It is also very difficult to follow what has changed between versions of an ontology when examining a diff between them in which some/many of the identifiers are opaque. Diffs are commonly used to identify ontology updates in GitHub/GitLab.

I'll admit, I haven't had this negative experience with diffs. Do you have an example we can see? Meanwhile, I am familiar with two styles of diffs. One, CCO provides, couldn't it easily come with rdfs:label attached to them? Another, native to Git's diffs on commits (squashed or unsquashed) have the rdfs:label usually a few lines up or a few lines down from the line that was changed. So, I'm not sure how severe this is for workflows and reviewing/approving diffs.

[edit] Okay, I have some good example. Subclass axioms might be added, changed, or whatever. In such cases, you'd need to find the other lines of code where the IRIs' annotations are. Good point. I'd say better tooling is the issue.

I'll raise you another. SPARQL queries, SHACL constraints, and R2RML mappings ... All of these tend to have even worse tooling for resolving rdfs:label.

Use of opaque identifiers violates the DoD Data Strategy key goal to “make data understandable.”

I contend that the use of our many human-readable annotations on IRIs (opaque or not) satisfy the DoD Data Strategy key goal to "make data understandable." The DoD-IC's relations working group is enumerating the functional requirements of annotations for IRIs to make data understandable. If so, this might make the opaque/human-readable debate irrelevant. (Or, what I think is more likely, to the extent that it is relevant, opaque IRIs promote understandability in the long run, in the ways explained earlier in this thread.)

jimschoening1 commented 1 month ago

Alan, I agree IEEE is not perfectly trustworthy, but who is this developer group and are they a more trustworthy entity? If IEEE decided to defund the PURL server, I believe OSWG members (Are not We the developer group?) would find a way to keep it going. Ron owns the domain now. If we kept it, shouldn't IEEE own it?

alanruttenberg commented 1 month ago

The people who are most invested in making sure the ontology works are the developers and users. There will be a steering committee for Common Core and they are the ones that should control the domain. Certainly we could have IEEE as one of the administrators. But having them own it puts the domain in the same situation as an IEEE domain. If things go to hell, getting back control is dicey.

CommonCoreOntology / CommonCoreOntologies

In IRIs, use opaque identifiers instead of english labels #105