TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
271 stars 88 forks source link

Update ISOCat reference to DatCatInfo #2227

Closed JanelleJenstad closed 1 year ago

JanelleJenstad commented 2 years ago

From Roberto Rosselli Del Turco via TEI-L (2022-02-02): "in the Guidelines section devoted to Dictionaries there's one reference to the ISOCat standard, but the latter has been superseded by DatCatInfo: http://www.datcatinfo.net/ (with a more precise URL of course) instead of http://www.isocat.org/ in https://tei-c.org/release/doc/tei-p5-doc/en/html/DI.html#index-egXML-d52e79215."

bansp commented 2 years ago

I can try to take this one, because it requires a bit of crafting. The new standard defines a bit different entity with a bit different role, so a direct substitution might not be the best way to handle that. But I've just updated an ISO standard wrt the references to the dead ISOCat (even Heisenberg wouldn't be uncertain, here, sadly), so I think I can handle this one as well. Not sure if I don't actually have that assignment on a very old plate, somewhere among the grey TEI issues -- gonna have a look now.

bansp commented 2 years ago

OMG, look at that... #232, #1089, #1866 . It's time... :-)

martinascholger commented 2 years ago

Hm, I thought I had already implemented #1866. I will investigate why I did not.

bansp commented 2 years ago

Oops, the middle issue that I listed above is actually closed. Well, @martinascholger , please ping me if you decide that I can be of use. My instinct would be to look at all the text fragments that mention the mechanism, to remove the recommendation to use ISOCat and definitely not replace it with a recommendation of the privately-owned datcat, but rather generalise. The original 'mistake' was to recommend ISOCat as if it were the only such service, while it should have simply be treated as an example of an external reference taxonomy. Its use goes beyond language-related applications, too.

JanelleJenstad commented 2 years ago

@bansp: Can you recommend some wording that would allow us to indicate the need to refer to a standard without mentioning datcat?

Here are the sections of the Guidelines that currently contain references to ISOcat:

Attribute class: att.datcat

Element spec: <gram>

Example in 18.3

Text of 18.3: "Whether at the level of feature-system declarations, feature- and feature-value libraries, or individual features, it is possible to align both feature names and their values with standardized external data category repositories such as ISOcat. In the following example, both the feature part_of_speech and its value #commonNoun are aligned with the respective definitions provided by [ISO DCR (Data Category Registry)], as implemented by ISOcat."

Note 82 in 18.3

Text of 9.5.2: "The TEI provides means to align grammatical categories as well as their content with the ISOcat reference, which is a Web implementation of [ISO 12620]. / In the example below, a fragment of the entry for isotope cited in section 9.3.2 Grammatical Information is adorned by references to ISOcat definitions for "part of speech" (dcr:datcat) and "adjective" (dcr:valueDatcat). Depending on the status and extent of the dictionary, various strategies may be used to reduce the redundancy of the repeated ISOcat references."

Example in 9.5.2

bansp commented 2 years ago

I can submit a PR some time after this week. Will try to keep this in sights. Cheers!

JanelleJenstad commented 2 years ago

Thanks @bansp! We refrigerate this weekend. We can sneak these changes into the upcoming release if you are able to do them early next week.

bansp commented 2 years ago

I'd love to say "challenge accepted", but I am unable to make promises at this point. Will try my best though, knowing the stakes. :-)

ebeshero commented 1 year ago

Council F2F: @bansp We're approaching another release in October, so we're hoping perhaps to fix this by then. Can you help?

bansp commented 1 year ago

Heck, yes, @ebeshero and thanks for the ping -- I'll handle this after my Wednesday presentation and before Saturday morning. Funny that I thought about that issue (and a few other of my old promises) maybe two minutes before seeing your message.

martindholmes commented 1 year ago

@bansp Can we discuss this at the Ling SIG this afternoon? Seems like an appropriate venue since this is a matter for linguists.

martindholmes commented 1 year ago

At Ling SIG, @bansp gave us a really helpful overview of the situation here and we think every current reference in the Guidelines to ISOCat should be replaced by a generic recommendation to point to a data category repository which ideally conforms with ISO 12620 if appropriate.

martindholmes commented 1 year ago

Following discussion with @peterstadler and @bansp: @bansp will revise the spec page for att.datcat to serve as an example, and make a pull request; then Council can track down and revise all other mentions and invocations of ISOCat throughout the Guidelines and fix them following @bansp's example.

bansp commented 1 year ago

While I'm nibbling on this, a sub-issue struck me, namely the matter of the dcr namespace, which is "http://www.isocat.org/ns/dcr" (recall that it's dcr:datcat). The namespace can be taken as an URI, whether it resolves to anything in the future (atm it doesn't) or not. But it's not possible to predict the authority that is going to control it in the future, unless someone (CLARIN, TEI-C) were to purchase the domain. I don't think either will care to. And we could ignore the authority, just as we could ignore the dead link, but we know that some users are going to keep asking about that.

One solution could be to ask the CLARIN Standards Committee to assign a "clarin.eu"-based namespace URI for the dcr: prefix. But this solution, from the point of view of the TEI is not necessarily optimal, because it still delegates the authority to an external party. The CSC maintains two such URIs at the moment for ISO, and you can sense that there is a certain degree of randomness in this: https://www.clarin.eu/content/standards#namespace-assignment .

Another solution, which at this point seems to me pretty optimal (under the circumstances), is to deprecate the dcr prefix and to place the two attributes directly in the TEI namespace. That is because the presence of the prefix was justified by the fact that the old DCR standard that used ISOCat explicitly defined these attributes. However, the new DCR standard ("ISO 12620-1, Management of terminology resources - Data categories - Part 1: Specifications") does not even mention the namespace bound to the dcr prefix, and if it were to mention it (which I don't think is going to happen in the foreseeable future), it wouldn't use the string "isocat" at the basis, because the TC37 secretariat is oversensitive to the substring "iso" and will not approve of "isocat" for sure. So it would use something else (but it won't be anything, I bet). The standard mentions the dcr:datcat attribute as an example of how the TEI utilizes the attributes (rather than of how these attributes should be used in XML documents). In other words, the standard has dissociated itself from the datcat attributes, merely acknowledging them as a TEI mechanism. ("Convoluted!", you will exclaim. So it is.)

"Oh no," you will go on, "the dcr prefix is mentioned by an ISO standard, so we have to keep it!" -- no worries. The editorial procedure of removing "dcr:" from one example in the standard (and it's only used in one example) is a matter of a microrevision, which the relevant working group will be able to perform in a month or two, if it tries -- and I can inform them of the need, if such a need arises.

I see the deprecation of the prefix as "nativizing" the DCR mechanism by the TEI. If that move were approved by the Council, I would be happy to use the unprefixed attributes in the version of ISO MAF that is about to be submitted for the committee ballot. Perhaps there is a chance for the Council to address this in the time remaining in Newcastle?

martindholmes commented 1 year ago

I agree with removing the prefix and the namespace. That will make the attributes seem more generally applicable for people who want to point at their own data categories. @laurentromary do you have any thoughts on this?

laurentromary commented 1 year ago

Yes, that's a good option!

ebeshero commented 1 year ago

@rettinghaus @bansp @sydb @martindholmes @martinascholger I was working on #2340 about inconsistent ISO referencing, and stumbled into the datcat question on my own. Can we do one of the following here?

ebeshero commented 1 year ago

Reviewing the ticket and recalling conversations, that link update seems precisely what we want to avoid. Sorry for barging in on the back of another ticket! Anyway, I’ll concentrate on the easier updates in #2340 . And await the PR.

ebeshero commented 1 year ago

But I wonder if we can, for the moment, just remove the references to http://isocat.org/ as preparation for the coming PR, since we know we shouldn’t be pointing to it at all. We seem agreed here not to be pointing to a standard, and the idea of “nativizing” data categories (and perhaps other things for which we used to rely on ISO) seems the path for TEI.

bansp commented 1 year ago

If there is no rush to remove the references to isocat today as opposed to a week ago, may I ask for them to be left in place for now, simply because I'm, going over them all (I know, more than I was asked for, but it's hard to leave them if I can handle them), and I already anticipate some conflicts with my version even before I submit the PR. Not increasing the amount of extra work spent on resolving conflicts would be very welcome, because I'm racing against several clocks (but this item is my priority now).

ebeshero commented 1 year ago

Got it--thank you, @bansp . I won't touch the isocat links and leave this to you. I'd like Council to review my table of proposed ISO citation updates anyway before I do anything more on #2340 . Let us know if we can help with anything on your end!

bansp commented 1 year ago

The result is in PR #2359 , spread mostly across Specs/att.datcat.xml and the FS and DI chapters, with some little extras. The description in the att.datcat spec is a bit verbose... but I tried to gather the various ways in which the attributes can be used, with examples. Eliminated references to ISOCat from FS, DI and the <gram> spec. Added some bits to the FS chapter in order for the individual examples to start making sense when combined. The results can also be seen directly in:

Note: the three above documents invoke the Paderborn version of the TEI schema, with the dcr: namespace eliminated.

I hope the result is acceptable (I still need to check if I have introduced any layout mess in FS; but will catch some sleep first). I am of course willing to work on improving/rearranging the info even if something close to its current version gets merged for the upcoming release. Cheers!

bansp commented 1 year ago

This is firstly to register my thanks to @sydb for the wonderful lot of work he has put into his review of the PR. I prefer to do this here, rather than within the now lengthy and I'm not sure how persistent PR itself. Extremely helpful, and I think I'm learning even some small things, like whether I really want to use a solidus, or is the use of it just admitting my laziness... There's also a practical question to this though: how real is the chance to make it for the release? I need to spend some time on a yearly report today, and I'm not sure how long the fixes are going to take -- some are, admittedly, quick, but some points have made me realise that I had managed to tuck an entire sub-mechanism of the TEI aside in my brain while trying to handle the issue quickly, even despite the documentation for prefixDef still being open in one of my tabs. Wonderful stuff, I'll be happy to address it, but -- is there still a chance that, having put another night into this, I will see the PR go into the current release, or will I rather learn then that I should have taken a deep breath already at this point and calmly schedule this work item among the others that are winking at me from my calendar? Are we still in the process of the current release, @peterstadler , please?