CoEDL / nyingarn-workspace

The Nyingarn data ingest and preparation application
GNU General Public License v3.0
0 stars 0 forks source link

Question: How should users markup a particular language? #172

Open sophlew opened 1 year ago

sophlew commented 1 year ago

Example from Amy here, DumontDurville_1834-381. How should the French language be marked up? Screen shot shows the current error message Screenshot 2023-05-01 at 12 26 12 pm

Conal-Tuohy commented 1 year ago

For tagging the language of the text contained in a particular element, users should add the xml:lang attribute to that element , like so:

<line xml:lang="fr">VOCABULAIRE</line>

Here, the xml:lang attribute is attached to a line element, but it can go on any element, and applies to all the text contained in that element.

The language element shouldn't appear in the transcription at all; it's only for use inside the teiHeader, within a langUsage element which lists the languages appearing in the transcription. There, the ident attribute identifies the language itself, and the xml:lang attribute would identity the language in which the language is named e.g. here's the French language with the English name "French":

<language ident="fr" xml:lang="en">French</language>

or, the Spanish name for the French language:

<language ident="fr" xml:lang="es">francés</language>

There is also a lang element which can be used inside the transcription, to tag the names of languages (in the same way that e.g. persName can tag the name of a person mentioned in the text), e.g.

<lang>French</lang>
Conal-Tuohy commented 1 year ago

Generally, the main language of a TEI text would be tagged (with an xml:lang attribute) on the text element of the document. This would indicate that the entire text is in that language except where over-ridden by xml:lang attributes attached to individual elements within the text. In the Nyingarn workspace no-one gets to see the text element because they're transcribing within surface elements which each represent just one page, and the text element is created as a wrapper only when the TEI file is exported.

But the workspace does allow someone to mark the main language of a document in a metadata-entry form, I think, and strictly, this should end up encoded in xml:lang when the entire TEI file is reconstituted. I am pretty sure it doesn't do this currently, though.

nthieberger commented 1 year ago

I've been doing this for the Italian in New Norcia 38

Carne - pálgò flesh, meat

Ultimately it would be good to have the language word encoded too.

On Mon, 1 May 2023 at 13:12, Conal Tuohy @.***> wrote:

  • External email: Please exercise caution *

    Generally, the main language of a TEI text would be tagged (with an xml:lang attribute) on the text element of the document. In the Nyingarn workspace no-one gets to see the body element because they're transcribing within surface elements which each represent just one page. But the workspace does allow someone to mark the main language of a document in a metadata-entry form, I think, and this should end up as xml:lang attributes when the entire TEI file is reconstituted. I am pretty sure it doesn't do this currently, though.

    • Check that a document's main language is converted to a @.***:id when a TEI document is exported

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Conal-Tuohy commented 1 year ago

I think if the text as a whole is tagged with the indigenous language, then tagging all the Italian words as exceptions will mean the whole text is tagged correctly.