TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
274 stars 88 forks source link

<idno> coverage #158

Closed TEITechnicalCouncil closed 8 years ago

TEITechnicalCouncil commented 15 years ago

<idno> should be revised to:

Original comment by: @laurentromary

TEITechnicalCouncil commented 8 years ago

This issue was originally assigned to SF user: kshawkin Current user is: kshawkin

TEITechnicalCouncil commented 15 years ago

For more context on this discussion, see https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0812B&L=TEILIB-L&T=0&F=&S=&P=876

Original comment by: nobody

TEITechnicalCouncil commented 15 years ago

The first proposal is simple and can be done straightaway; the second is less obvious, since <idno> is defined as identifying the object being documented, rather than e.g. its author.

Original comment by: @lb42

TEITechnicalCouncil commented 15 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 15 years ago

SF 2493417 consists of two parts. The first part asks for some extra examples that show idno's are not necessarily numeric. Syd provided some examples in SF bug 2457147.

The second part of the feature request requests of <idno> that we 'extend its scope so that it can treat unique identifiers for core components of a bibliographical reference, in particular, authors (it should thus be part of the content model of <author> among others'. The rest of this comment discusses that second request.

It is clear there are many advantages to unique identification of scholarly authors: finding an author’s other articles, finding an author’s current affiliation, relating non-article publications (weblog entries, etc.) all require some more robust way of identifying a person than by name. An illustration of that fact is given by [1]: the Mathematical Reviews author database contains 32 authors called "Wang, Wei" with no additional names. For more literature, see [2, 3, 4].

It should therefore be possible to identify scholarly authors by something other than their name. There exist, perhaps unfortunately, several initiatives to assign unique id's to scholarly authors, such as Researcher ID (http://www.researcherid.com/) and Digital Author Identifier (http://www.surffoundation.nl/smartsite.dws?ch=ENG&id=13480). Others have argued researchers should be identified through their OpenID accounts (http://openid.net/). National libraries have their (overlapping) authority files. There exists an upcoming ISO standard for identifying names/entities: International Standard Name Identifier (http://www.isni.org/). Elsevier has its Scopus id’s.

It should be possible to store these author identifiers in (TEI) bibliographies. We could achieve that effect in a number of ways: (1) use @key on an <author>’s <name> (2) use @ref on an <author>’s <name> (3) add <author> to att.canonical and use @key or @ref on <author> (4) create a new element <authorid> and add it to <author>’s content model (5) extend the scope of the existing element <idno> and add <idno> to <author>’s content model

Any solution will however have to cater for the fact that authors may have multiple digital author identifiers, corresponding to different scheme’s. E.g.:

This means that any solutions that rely on attributes will either need to somehow store the identification scheme in the attribute, or have to rely on parsing the value to guess what scheme is applicable. @key has the added problem that it holds by definition only one value, so even if key="researcherid:C-1234-2008" would work, it could not at the same time hold the International Standard Name Identifier for the researcher. @ref could hold multiple values, but must contain uri’s; we could have ref="info:eu-repo/dai/nl/12456454 https://me.yahoo.com/johndoe61" but then software would have to guess what scheme is applicable.

This implies that for a robust solution we need a repeatable element that stores the identifier’s scheme as a type or scheme attribute, and the value either as text or as a value attribute. We can either create a new element for the purpose, e.g. <authorid>, or reuse an existing element.

The proposal here is to use the existing <idno> element. The need to identify authors is exactly analogous to the need to identify bibliographic elements such as articles or monographs, the element has already an appropriately generic name, and I see no reason why not to use it. This does not involve, as Syd wrote on the TEI in Libraries mailing list (https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0901B&L=TEILIB-L&T=0&F=&S=&P=2774), a ‘semantic shift’: <idno> would have the same meaning it always had, it would just be applied to new elements.

This would involve:

We could then have e.g. <author> <idno type="nldai">info:eu-repo/dai/nl/12456454</idno> <idno type="openid"> https://me.yahoo.com/johndoe61&lt;/idno&gt; John Doe </author>

[1] TePaske-King, B. and Richert, N. (2001), 'The identification of authors in the Mathematical Reviews Database', Issues in Science and Technology Librarianship, 31. [2] Bourne, Philip E. and Fink, J. Lynn (2008), 'I Am Not a Scientist, I Am a Number', PLoS Computational Biology, 4 (12), e1000247. [3] Danskin, Alan, et al. (2008), 'A review of the current landscape in relation to a proposed Name Authority Service for UK repositories of research outputs', (JISC). [4] Cals, J. W. L. and Kotz, D. (2008), 'Researcher identification: the right needle in the haystack', The Lancet, 371 (9631), 2152-53.

Original comment by: @pboot

TEITechnicalCouncil commented 15 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 15 years ago

Council on 2 April decided that we should define a new phrase level class for use in content of bibliographic elements like author etc. and that this should contain idno

Original comment by: nobody

TEITechnicalCouncil commented 15 years ago

FYI, the Library of Congress plans a new service ( http://id.loc.gov/ ) giving URIs for their authority records. This is just one more example, like Researcher IDs, Digital Author Identifiers, and Scopus IDs, given above.

Original comment by: @kshawkin

TEITechnicalCouncil commented 15 years ago

Regarding http://id.loc.gov/ , it was pointed out to me that LC does not plan at this time to add name authority records, apparently because the name authority file is jointly managed by LC and the members of NACO and LC is hesitant to make this data so easily available. In any case, the growing momentum behind Linked Data shows that we will increasingly have URIs for bits of data. People will want to point to these from their TEI documents.

Original comment by: @kshawkin

TEITechnicalCouncil commented 15 years ago

OCLC has set up a Linked Authority File, which provides a URI for records in the Library of Congress name authority file. Using the author given at https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0812B&L=TEILIB-L&T=0&F=&S=&P=876 , instead of giving http://orlabs.oclc.org/identities/lccn-n96-118820 , we might instead give http://errol.oclc.org/laf/n96-118820.html (which is linked from the first URL).

Original comment by: @kshawkin

TEITechnicalCouncil commented 15 years ago

Two thoughts jump to mind. First, it is worth pointing out that in the example Peter provides text content is put at the same "level" as element content, which most of us find distirbing. So I would suggest instead <author> <idno type="nldai">info:eu-repo/dai/nl/12456454</idno> <idno type="openid"> https://me.yahoo.com/johndoe61&lt;/idno&gt; <persName>John Doe</persName> </author>

But more importantly, before you permit <idno> in the innards of <author>, <editor>, <publisher>, etc., I'd like to know know what are the explicit semantics? Problematic to just say "an identifier for the thingy indicated by the parent element", because the parent is canonically <publicationStmt>.

Also I'm also a little worried about what multiple occurrences mean. Are the two <idno>s in the above example alternative identifiers of the same author (in this case a person)? Or are there three authors? If the answer to you is immediately obvious "alternative", think about the analagous cases:

<author> <persName type="common">Peter Boot</persName> <persName type="regularized">Boot, Dr. Peter</persName> <persName xml:lang="es">Pedro Cargador</persName> </author> vs. <author> <persName>Syd Bauman</persName> <persName>Peter Boot</persName> <persName>Kevin Hawkins</perssName> </author>

The Guidelines license the multiple authors in one <author> approach.

Original comment by: nobody

TEITechnicalCouncil commented 15 years ago

My opinion is that we should definitely recommend to have only one "author" per <author> so that a clear semantic is set that any information enclosed in an <author> element relates to one single person.

Original comment by: @laurentromary

TEITechnicalCouncil commented 15 years ago

I definitely prefer the one-entity-per-<author> method of encoding such information (remember that the author may be an <orgName> or a string other than a name like "anonymous" or "unknown"). That is what I always suggest people do. However, I'm not sure the TEI Guidelines could really get away with such a requirement now, for two reasons.

First, using one <author> to encode multiple authors is explicitly permitted in P5. Second, it would break the parallelism between <author> and <respStmt><resp>author</resp></respStmt>. The semantics of the latter are (I think) that each entity named with a <name> inside the <respStmt> is a separate entity who shared in that responsibility. (Although the only examples in the Guidelines are in German, so I don't know for sure what's going on, but they look to me like they may be using <respStmt> in a manner that I would advise against.)

Original comment by: @sydb

TEITechnicalCouncil commented 14 years ago

Changing to amber, as there is a lot to discuss here.

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

Connected to ID: 2949985?

Original comment by: sf_user_epierazzo

TEITechnicalCouncil commented 14 years ago

Agreed that prose should legitimize use of <idno> as proposed add idno to model.nameLike; prose needs revision to address multiple authors within <author> issue. Avoid changing content model of <author> if possible.

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

idno has been added to model.nameLike, but the documentation needs updating

Original comment by: @lb42

TEITechnicalCouncil commented 13 years ago

During today's session of 2011 Council meeting, I agreed to revise the documentation but said I might not get to it till July 2011.

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Examples of <idno> used for DOI and ISSN were added long ago. But I've just removed implication that <author> would be used for more than one author. See revision 9827.

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

My notes from Chicago meeting say that I was supposed to add examples of uses of <idno> for author identifiers to various places in the Guidelines, but I'm not sure this is an appropriate path forward now that we've been discussing use of @ref to give a URI or URN instead of a @key or <idno>. See http://purl.org/TEI/fr/3437509 and http://www.purl.org/TEI/FR/2919640 .

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

I think we need a little subcommittee to work on this whole mess -- <idno>, <ident>, <ref>, and <ptr>. I volunteer. Who else fancies working on this, and having a full proposal to bring to Ann Arbor in April?

Original comment by: @martindholmes

TEITechnicalCouncil commented 12 years ago

Perhaps the ISBN examples should be changed from using <idno> to <ptr/> or <ref> as at http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind1203&L=TEI-L&T=0&O=D&P=6120 .

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

See also http://purl.org/TEI/fr/3500566 .

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

Per FR 3437509, we are still allowing @key but encouraging use of tag URIs in @ref instead. The long comment below by pboot shows that if you want to associate more than one identifier with an <author> or <editor>, you can't do this with @key or @ref; instead, you need multiple <idno>s inside <author> or <editor>. And <ident> is unrelated. So the documentation on identifiers can be added.

I've added examples of authors at revision 10377:

http://tei.svn.sourceforge.net/viewvc/tei/trunk/P5/Source/Guidelines/en/CO-CoreElements.xml?r1=10377&r2=10376&pathrev=10377

Further suggestions welcome.

Original comment by: @kshawkin

TEITechnicalCouncil commented 12 years ago

Original comment by: @kshawkin