DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

xml:lang should be compulsory on entry #30

Closed ttasovac closed 6 years ago

xlhrld commented 6 years ago

This would make sense when there were articles in different languages comprised in one resource which doesn't happen very often. I'd rather suggest to require TEI/text/@xml:lang while still optionally allowing xml:lang on entry. We rely on inheritance all over the place (e.g. see the examples with cit[@type="etymon"]) so why not with xml:lang here?

ttasovac commented 6 years ago

I used to be against this idea — after all that's what we have inheritance for. But I'm now i'm in favor of it because I'm thinking of the use cases in which we pool different dictionaries together... it's just easier to filter entries based on their language directly, then by going up and down the hierarchy chain.

I know it's also not hard to do /TEI[@xml:lang='de']//entry but I think we can err on the side of overexplicitness in this case.

xlhrld commented 6 years ago

But TEI Lex-0 is a baseline encoding for dictionaries, not just for entries. For a single dictionary, @xml:lang on //text should be required (that's probably uncontroversial because it's the minimum requirement to make inheritance work in the first place). @xml:lang being optional on //entry would still allow for the anticipated pooling use case because you can always make the inheritance explicit in your sources by simply factoring out //text/@xml:lang to every (possibly embedded) entry. This would still result in a TEI Lex-0 conforming mark-up. We just shouldn't force everyone to cater for this use case.

Following your rationale we would have to force the application of @xml:lang on many other elements as well. People could be tempted to pool etymologies or //form[@type="headwords"] (which is actually a rather common use case, too) and many more things within //entry.

laurentromary commented 6 years ago

The argument was to make explicit the object language of the corresponding entry, rather than letting the ambiguous (in that it is intended for the working language) @xml:lang be inherited "by accident". Doing so, we have entries which have some global autonomy and well documented from a lexicographic point of view.

ttasovac commented 6 years ago

I completely understand @xlhrld's reservations, but I think it's also important for us to think beyond the dictionary as an XML document. This is what "global autonomy" of the entry means to me, whether we're just quoting one entry, or converting the dictionary to a relational database (god forbid! 😄) ...

xlhrld commented 6 years ago

I'm fine with this »global autonomy of entries« – as long as this is prominently stated as a major objective of TEI Lex-0 in the spec. Otherwise it just smells like an arbitrary privilege for entry.