COST-ELTeC / Schemas

Contains RELAXNG schemas used to validate ELTeC texts
4 stars 1 forks source link

More structured way of handling authority identifiers in the header #3

Open christofs opened 4 months ago

christofs commented 4 months ago

We currently have no really transparent, easy-to-parse way of providing authority identifiers such as VIAF and Wikidata ids, as noted also in #1. One solution for this that does not require a prefixDef would be something along the following lines:

<titleStmt>
<title>...</title>
<author>
<name>last, first</name>
<idno type="wikidata" corresp="https://wikidata.org/wiki/">Q0123456789</idno>
</author>
</titleStmt>

Similarly, for editors or other people, and with an alternative structure:

<respStmt>
<resp>publisher</resp>
<name>Trier University</name>
<idno type="ROR" corresp="https://ror.org/02778hg05"/>
</respStmt>
<respStmt>
<name>Julia Röttgermann</name>
<idno type="ORCID" corresp="https://orcid.org/0000-0002-1918-8117"/>
</respStmt>

I'm not insisting on any of the attributes or particular structures, and happy to see alternative solutions. But what would be nice is to be able to use XPath without a lot of tricks (like looking up base URLs somewhere else depending on the value of an attribute) and without context knowledge (such as base URLs) in order to automatically follow the links implied by these identifiers. I'd be happy to accept some verbosity or even redundancy to make this possible.

lb42 commented 4 months ago

Why do you want to give the identifier value as an attribute rather than as content of the <idno> ? What's wrong with e.,g.

<idno type="ORCID">https://orcid.org/0000-0002-1918-8117</idno>
<idno type="wikidata" >https://wikidata.org/wiki/Q0123456789</idno>

Purists might object that the identifier and the URL using it should be distinguished, I suppose. In which case you could simply do

<ref type="ORCID">https://orcid.org/0000-0002-1918-8117</ref>
<ref  type="wikidata" >https://wikidata.org/wiki/Q0123456789</ref>

or even

<ref type="ORCID">https://orcid.org/<idno>0000-0002-1918-8117</idno></ref>
<ref type="wikidata" >https://wikidata.org/wiki/<idno>Q0123456789</idno></ref>
morethanbooks commented 4 months ago

I don't consider myself a purist :) but yes, I would like to see the IDs and the URL explicitly encoded separately. I could live with <ref type="ORCID">https://orcid.org/<idno>0000-0002-1918-8117</idno></ref>, although I can imagine that this might be a bit problematic when reading the files with e.g. lxml.

lb42 commented 4 months ago

Actually, thinking about this again, it's clear that supplying the "canonical" reference via an attribute rather than as content for an ident makes much better sense, for the simple reason that you might have (e.g.) a VIAF number for an author and also for a title, not to mention others. Much simpler to specify those values using @ref (or @corresp) on the appropriate element (author, title, etc.). These attributes are by definition URL values, so the full URL must be supplied, possibly abbreviated via a defined prefix.

christofs commented 4 months ago

I agree that there are some arguments for supplying the identifier as a @ref on the appropriate element (such as title, author, name). However, when there are multiple relevant identifiers (such as GND, Wikidata and VIAF), this becomes cumbersome and a bit less easy to extract.

I think that is why I still prefer the solution proposed above: <idno type="ORCID">https://orcid.org/0000-0002-1918-8117</idno>
or <ref type="ORCID">https://orcid.org/0000-0002-1918-8117</ref>

The <idno> element would be a child of the relevant element / entity, to make clear what it refers to. That's pretty clear, too, isn't it?

And I personally don't care for separating the domain / base URL from the identifier itself. Those who want to look up the information, can do it using the full link. And those who want to record just the identifier itself, can easily remove the base URL, which is always the same for a given identifier.

I especially don't like the nesting of ref and idno. What about this structure, though?

<idno type="ORCID" xml:base="https://orcid.org/">0000-0002-1918-8117</idno>

That seems pretty neat to me.

lb42 commented 4 months ago

Using a child <idno> works well for some elements (like author) but less well for others e.g. title - it's allowed by the default TEI content model for , but then you either have to have mixed content or weird tagging like this</p> <pre><code> <title><title>The real title</title><idno>the identifier</idno></title></code></pre> <p>The Guidelines have examples using <code><idno></code> in the way you would like but only as children of <code><publicationStmt></code> which is not (I think) what we are looking for here.</p> <p>Adding a @ref attribute, however, is possible and easy for all the elements for which we might want to link to an authority file. Is it really so difficult to extract individual values from multi-valued attributes? (You have to be able to do that to parse any TEI pointer value, after all) How often will there be multiple values to supply? </p> <p>I can, of course, make the schema support either approach, or both! </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/lb42"><img src="https://avatars.githubusercontent.com/u/660724?v=4" />lb42</a> commented <strong> 4 months ago</strong> </div> <div class="markdown-body"> <p>And yes, I agree that distinguishing the identifier within the URL is a bit weird. Using @xml:base as you suggest seems simple enough though.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/lb42"><img src="https://avatars.githubusercontent.com/u/660724?v=4" />lb42</a> commented <strong> 4 months ago</strong> </div> <div class="markdown-body"> <p>Another reason for preferring the @ref solution is that it's the one we already explicitly suggest in the ELTeC doc, of course. So it has to remain as a possibility, or we break existing documents :-(</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>