Open SJagodzinski opened 3 years ago
The more I think about it, the more I think it's a mistake to follow EAD3 on this one. I don't think that we should ignore https://www.w3.org/TR/xml-i18n-bp/, specifically this recommendation:
It is not recommended to use your own attribute or element to specify the language of the content. The xml:lang attribute is supported by various XML technologies such as XPath and XSLT (e.g. the lang() function). Using something different would diminish the interoperability of your documents and reduce your ability to take advantage of some XML applications.
I've got the alpha schema set up to use the new attribute names, but I would also like to eventually create a branch of the schema that removes all of those attributes (aside from languagecode and scriptcode) and instead uses the "xml" namespace as intended.
Although we could continue to have EAD/S continue to do its own things and ignore best practices, it seems like a bad idea not to make the standard more interoperable with other XML standards like TEI, DITA, DocBook, MODS, etc., all of which use xml:lang, as well as RDF and other data serializations that also seem to have settled around doing the same. Why make it more difficult to move between all of those and require a local mapping to do so (and lose out on built in features in XPath, etc.)? Just my two cents 😄
I have to admit that I am still not convinced about the argument's strength to merit newly introducing @xml:lang
in a future version of EAD.
Assuming that we did, a few additional thoughts:
@xml:lang
, we should include ALL attributes from the XML namespace, i.e. also (re)introducing @xml:id
and @xml:base
and @xml:space
not only @xml:lang
.@languageCode
and @scriptCode
as a standardised representation of the <language>
and <writingSystem>
elements used with <languageUsed>
, <langmaterial>
, and <languageSet>
.@xml:lang
is character data. The recommended values themselves come from IANA.xsd:language
as mentioned in #97 for its use by RDF, this is more specific than what's given with the XML namespace. xsd:language
uses a pattern - [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* - against which it validates (see https://www.w3.org/TR/xmlschema11-2/#language) and the following formats seem to be the most common:
Btw - just found this in the MODS user guide (https://www.loc.gov/standards/mods/userguide/attributes.html#lang):
citation starts
lang
@lang
indicates the language of the content of an element, using a code from ISO 639-2/b.
Example
<name type="personal">
<namePart type="given">Jack</namePart>
<namePart type="family">May</namePart>
<namePart type="termsOfAddress">I</namePart>
<description lang="eng">District Commissioner</description>
<description lang="fre">Préfet de région</description>
</name>
xml:lang
@xml:lang
serves the same purpose as @lang
, but follows the W3C documentation that indicates using the IANA language subtag registry, which includes codes from the ISO language and script standards.
Example
<titleInfo xml:lang="fr" type="translated">
<nonSort>L'</nonSort>
<title>homme qui voulut être roi</title>
</titleInfo>
citation ends
Assuming that we do not want to use both attributes next to each other and given that we've decided to open up the options of how languages could be encoded (i.e. not only IANA, but also the three variations of ISO 639 plus other language encodings), I'd be back at using an attribute of our own rather than going back to @xml:lang
.
The TEI guidelines provide a great overview here about how they encode languages: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html#CHSH (which stresses the following: "For maximal compatibility with existing processes, the identifier for the language must be constructed as in Best Current Practice 47")
As time goes on, I grow more convinced that it's better to keep the "xml" namespace in EAC for id, base, lang, and adding space, since I don't really see the need for EAC/D to ignore that convention (and to make it more difficult to share data). In the two examples from MODS, the first won't work, for instance, if I want to use something like the built-in "lang" function from XPath (https://www.w3.org/TR/xpath-functions-31/#func-lang) to determine the language, whereas the second one does.
All that said, we've got languageOfElement and scriptOfElement in the development branch of EAC, which aligns it with the path taken by EAD.
Just as a note: "the path taken by EAD" only means not having introduced the XML namespace when defining EAD3. :-)
As for potentially going back on the decision with regard to XML namespace, this would mean:
@xml:space
;@scriptOfElement
;@script
as in current EAD3 (I wouldn't use @scriptCode
for providing the ISO code for the script named in the element <writingSystem>
and for providing an ISO code relating to the content of any other element as EAC-CPF currently does);@xml:lang
which would enable them to encode language and script in one value if script information is seen as essential.Tested as part of Schema Team's schema testing:
@xml:lang
does not exist anymore in the draft schema@languageOfElement
is used with 86 out of 89 elements in the draft for EAC-CPF 2.0
<control>
(#81), <cpfDescription>
(#79), <eac>
(#78), <identity>
(#114), and <relations>
(#210) to the 81 elements listed above in the solution documentation. Could you please confirm if these should or should not have the language attributes?<multipleIdentities>
(#80) should maybe have them, too (see also below)? @xml:lang
currently is available for <multipleIdentities>
(same as the other elements mentioned) in EAC-CPF 1.0.<description>
(#138) should then still retain the language attributes as the only high-level wrapper element?@languageOfElement
are:
<multipleIdentities>
- to be clarified (see above)<entityType>
- as it does not have text<objectXMLWrap>
- as its sub-elements are not from the EAS namespace@languageOfElement
is always available alongside @scriptOfElement
(#152)The above applies to both schemas, RNG and XSD.
@fordmadox , @kerstarno : Please keep the lang attributes as they are: not available in <mulitpleIdentities>
, <entityType>
and <objectXMLWrap>
List will be completed
@SJagodzinski thanks for the confirmation.
With this, the attribute is ready.
@fordmadox please take note of <multipleIdentities>
not having language attribution in EAC-CPF 2.0 anymore, i.e. we will need to think about a transformation strategy in this case.
Recommendation of IETF language tags needs to be discussed, also with respect to feedback from the CfC.
Asked community about use of IETF language tags in @languageOfElement
(which replaces @xml:lang)
in call for comments and did not receive any feedback.
EAC-CPF team meeting, 8 Aug 2021:
Agreed to recommend the use of IETF language tags in @languageOfElement
, create entry in Best Practice Guide for this.
EAD team will follow EAC-CPF decision.
Language of Element
Replace
xml:lang
with optional attribute@languageOfElement
with data type NMTOKEN. Use@languageOfElement
in all non-empty elements.Creator of issue
Related issues / documents
Remove xml ns to align with EAD 3 #27 @xml:lang: adopt EAD 3 solution #28 Language codes: adopt EAD 3 solution #29 @scriptCode: remove and adjust tag library for @xml:lang/@lang Attribute #30
EAD3 Reconciliation
Summary: Indicates the language of the content of an element. Content of the attribute should be a code taken from ISO 639-1, ISO 639-2b, ISO 639-3, or another controlled list, as specified in the langencoding attribute in . May be used consistently in a multi-lingual finding aid to specify which elements are written in which language. Available on all non-empty elements.
Data Type: NMTOKEN
Context
@xml:lang XML Language
Summary: Two-letter language code from the IANA registry as dictated by the W3C specification.
Description and Usage: The xml:lang may occur on any element intended to contain natural language content whenever information about the language of the content of this element and its children are needed. xml:lang should be used when the language of the element differs from the Language Code declared in the languageCode attribute on the element within the element. The values in the list are taken from the IANA Registry (http://www.iana.org/assignments/language-subtag-registry). The use of the IANA Registry code for languages in this context is outlined in the W3C specification. The syntax is specified at: http://www.w3.org/International/articles/language-tags/.
Data Type: IANA Registry for language codes.
Solution documentation: agreed solution for TL and guidelines
Summary: Indicates the language of the content of an element. Content of the attribute should be a code taken from ISO 639-1, ISO 639-2b, ISO 639-3, or another controlled list, as specified in the langencoding attribute in
<control>
. May be used consistently in a multi-lingual entities description to specify which elements are written in which language. Available on all non-empty elements.Data Type: NMTOKEN
May occur within:
<abstract>
,<address>
,<addressLine>
,<agencyCode>
,<agencyName>
,<agent>
,<alternativeSet>
,<biogHist>
,<chronItem>
,<chronItemSet>
,<chronList>
,<citedRange>
,<componentEntry>
,<contact>
,<contactLine>
,<conventionDeclaration>
,<date>
,<dateRange>
,<dateSet>
,<description>
,<descriptiveNote>
,<event>
,<eventDateTime>
,<eventDescription>
,<existDates>
,<fromDate>
,<function>
,<functions>
,<generalContext>
,<geographicCoordinates>
,<head>
,<identityId>
,<item>
,<language>
,<languageDeclaration>
,<languageUsed>
,<languagesUsed>
,<legalStatus>
,<legalStatuses>
,<list>
,<localControl>
,<localDescription>
,<localDescriptions>
,<localTypeDeclaration>
,<maintenanceAgency>
,<maintenanceEvent>
,<maintenanceHistory>
,<mandate>
,<mandates>
,<nameEntry>
,<nameEntrySet>
,<occupation>
,<occupations>
,<otherAgencyCode>
,<otherEntityType>
,<otherEntityTypes>
,<otherRecordId>
,<p>
,<part>
,<place>
,<placeName>
,<placeRole>
,<places>
,<recordId>
,<reference>
,<relation>
,<relationType>
,<representation>
,<rightsDeclaration>
,<setComponent>
,<shortCode>
,<source>
,<sources>
,<span>
,<structureOrGenealogy>
,<targetEntity>
,<targetRole>
,<term>
,<toDate>
,<useDates>
,<writingSystem>
Example encoding