SAA-SDT / eac-cpf-schema

https://eac.staatsbibliothek-berlin.de/
10 stars 4 forks source link

@xml:lang #151

Open SJagodzinski opened 3 years ago

SJagodzinski commented 3 years ago

Language of Element

Replace xml:lang with optional attribute @languageOfElement with data type NMTOKEN. Use @languageOfElement in all non-empty elements.

Creator of issue

  1. Silke Jagodzinski
  2. TS-EAS: EAC-CPF subgroup
  3. silkejagodzinski@gmail.com

Related issues / documents

Remove xml ns to align with EAD 3 #27 @xml:lang: adopt EAD 3 solution #28 Language codes: adopt EAD 3 solution #29 @scriptCode: remove and adjust tag library for @xml:lang/@lang Attribute #30

EAD3 Reconciliation

Summary: Indicates the language of the content of an element. Content of the attribute should be a code taken from ISO 639-1, ISO 639-2b, ISO 639-3, or another controlled list, as specified in the langencoding attribute in . May be used consistently in a multi-lingual finding aid to specify which elements are written in which language. Available on all non-empty elements. Data Type: NMTOKEN

Context

@xml:lang XML Language

Summary: Two-letter language code from the IANA registry as dictated by the W3C specification.

Description and Usage: The xml:lang may occur on any element intended to contain natural language content whenever information about the language of the content of this element and its children are needed. xml:lang should be used when the language of the element differs from the Language Code declared in the languageCode attribute on the element within the element. The values in the list are taken from the IANA Registry (http://www.iana.org/assignments/language-subtag-registry). The use of the IANA Registry code for languages in this context is outlined in the W3C specification. The syntax is specified at: http://www.w3.org/International/articles/language-tags/.

Data Type: IANA Registry for language codes.

Solution documentation: agreed solution for TL and guidelines

Summary: Indicates the language of the content of an element. Content of the attribute should be a code taken from ISO 639-1, ISO 639-2b, ISO 639-3, or another controlled list, as specified in the langencoding attribute in <control> . May be used consistently in a multi-lingual entities description to specify which elements are written in which language. Available on all non-empty elements.

Data Type: NMTOKEN

May occur within: <abstract>, <address>, <addressLine>, <agencyCode>, <agencyName>, <agent>, <alternativeSet>, <biogHist>, <chronItem>, <chronItemSet>, <chronList>, <citedRange>, <componentEntry>, <contact>, <contactLine>, <conventionDeclaration>, <date>, <dateRange>, <dateSet>, <description>, <descriptiveNote>, <event>, <eventDateTime>, <eventDescription>, <existDates>, <fromDate>, <function>, <functions>, <generalContext>, <geographicCoordinates>, <head>, <identityId>, <item>, <language>, <languageDeclaration>, <languageUsed>, <languagesUsed>, <legalStatus>, <legalStatuses>, <list>, <localControl>, <localDescription>, <localDescriptions>, <localTypeDeclaration>, <maintenanceAgency>, <maintenanceEvent>, <maintenanceHistory>, <mandate>, <mandates>, <nameEntry>, <nameEntrySet>, <occupation>, <occupations>, <otherAgencyCode>, <otherEntityType>, <otherEntityTypes>, <otherRecordId>, <p>, <part>, <place>, <placeName>, <placeRole>, <places>, <recordId>, <reference>, <relation>, <relationType>, <representation>, <rightsDeclaration>, <setComponent>, <shortCode>, <source>, <sources>, <span>, <structureOrGenealogy>, <targetEntity>, <targetRole>, <term>, <toDate>, <useDates>, <writingSystem>

Example encoding

fordmadox commented 3 years ago

The more I think about it, the more I think it's a mistake to follow EAD3 on this one. I don't think that we should ignore https://www.w3.org/TR/xml-i18n-bp/, specifically this recommendation:

It is not recommended to use your own attribute or element to specify the language of the content. The xml:lang attribute is supported by various XML technologies such as XPath and XSLT (e.g. the lang() function). Using something different would diminish the interoperability of your documents and reduce your ability to take advantage of some XML applications.

I've got the alpha schema set up to use the new attribute names, but I would also like to eventually create a branch of the schema that removes all of those attributes (aside from languagecode and scriptcode) and instead uses the "xml" namespace as intended.

Although we could continue to have EAD/S continue to do its own things and ignore best practices, it seems like a bad idea not to make the standard more interoperable with other XML standards like TEI, DITA, DocBook, MODS, etc., all of which use xml:lang, as well as RDF and other data serializations that also seem to have settled around doing the same. Why make it more difficult to move between all of those and require a local mapping to do so (and lose out on built in features in XPath, etc.)? Just my two cents 😄

kerstarno commented 3 years ago

I have to admit that I am still not convinced about the argument's strength to merit newly introducing @xml:lang in a future version of EAD.

Assuming that we did, a few additional thoughts:

kerstarno commented 3 years ago

Btw - just found this in the MODS user guide (https://www.loc.gov/standards/mods/userguide/attributes.html#lang):

citation starts

lang @lang indicates the language of the content of an element, using a code from ISO 639-2/b.

Example

<name type="personal">
<namePart type="given">Jack</namePart>
<namePart type="family">May</namePart>
<namePart type="termsOfAddress">I</namePart>
<description lang="eng">District Commissioner</description>
<description lang="fre">Préfet de région</description>
</name>

xml:lang @xml:lang serves the same purpose as @lang, but follows the W3C documentation that indicates using the IANA language subtag registry, which includes codes from the ISO language and script standards.

Example

<titleInfo xml:lang="fr" type="translated">
<nonSort>L'</nonSort>
<title>homme qui voulut être roi</title>
</titleInfo>

citation ends

Assuming that we do not want to use both attributes next to each other and given that we've decided to open up the options of how languages could be encoded (i.e. not only IANA, but also the three variations of ISO 639 plus other language encodings), I'd be back at using an attribute of our own rather than going back to @xml:lang.

fordmadox commented 3 years ago

The TEI guidelines provide a great overview here about how they encode languages: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html#CHSH (which stresses the following: "For maximal compatibility with existing processes, the identifier for the language must be constructed as in Best Current Practice 47")

As time goes on, I grow more convinced that it's better to keep the "xml" namespace in EAC for id, base, lang, and adding space, since I don't really see the need for EAC/D to ignore that convention (and to make it more difficult to share data). In the two examples from MODS, the first won't work, for instance, if I want to use something like the built-in "lang" function from XPath (https://www.w3.org/TR/xpath-functions-31/#func-lang) to determine the language, whereas the second one does.

All that said, we've got languageOfElement and scriptOfElement in the development branch of EAC, which aligns it with the path taken by EAD.

kerstarno commented 3 years ago

Just as a note: "the path taken by EAD" only means not having introduced the XML namespace when defining EAD3. :-)

As for potentially going back on the decision with regard to XML namespace, this would mean:

kerstarno commented 3 years ago

Tested as part of Schema Team's schema testing:

The above applies to both schemas, RNG and XSD.

SJagodzinski commented 3 years ago

@fordmadox , @kerstarno : Please keep the lang attributes as they are: not available in <mulitpleIdentities>, <entityType> and <objectXMLWrap>

List will be completed

kerstarno commented 3 years ago

@SJagodzinski thanks for the confirmation.

With this, the attribute is ready.

@fordmadox please take note of <multipleIdentities> not having language attribution in EAC-CPF 2.0 anymore, i.e. we will need to think about a transformation strategy in this case.

SJagodzinski commented 3 years ago

Recommendation of IETF language tags needs to be discussed, also with respect to feedback from the CfC.

SJagodzinski commented 2 years ago

Asked community about use of IETF language tags in @languageOfElement (which replaces @xml:lang) in call for comments and did not receive any feedback.

EAC-CPF team meeting, 8 Aug 2021:

Agreed to recommend the use of IETF language tags in @languageOfElement, create entry in Best Practice Guide for this. EAD team will follow EAC-CPF decision.