Discussion of Dell’Oro 2020 & Vierros 2018

gabrielbodard commented 4 years ago

Francesca Dell'Oro, Helena Bermúdez Sabel & Paola Marongiu. 2020. “Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus.” Sharing the Experience: Workflows for the Digital Humanities. Proceedings of the DARIAH-CH Workshop 2019. Available: https://zenodo.org/record/3739440#.XzqoTZMzZTZ
Vierros, M. 2018. “Linguistic Annotation of the Digital Papyrological Corpus: Sematia.” In Nicola Reggiani (Editor), Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri. Berlin, Boston: De Gruyter. Pp. 105–118. Available: https://doi.org/10.1515/9783110547450-006

With both of these, please think about the aims and research questions behind the tools and methods discussed, rather than the technology and implementation.

chiaradimaio commented 4 years ago

The article by Francesca Dell'Oro describes the project called A World of Possibilities [WoPoss], aimed at tracking the evolution of modal meanings in the Latin language.

The writer explains how this analysis is carried out towards three steps:

1) Modals meaning 'necessity', 'possibility' and 'volition' in Latin are first collected from a diachronic corpus that ranges from 3rd century BCE to 7th century CE, including literary and documentary texts from different Latin-speaking regions of the ancient world. These selected texts are checked and confirmed to be philologically correct, so that they can be reused under a creative commons licence. Then, all text files are converted to plain text, but important structural information is kept (thanks to the so-called pseudo-markup)

2) The tool INCEpTION (a multi-modular annotation platform) is customized and adapted to the needs of this project: expressing the modal marker, its scope and their relation. Then the WoPoss team carries on with manual annotation, which is particularly useful in cases of ambiguity, since the description of passages could allow future users to notice semantic shift.

3) The annotated files are exported in XMI and transformed according to the TEI standards. Multiple layers of linguistic annotation include: most ancient meaning of each modal marker; transformation of the pseudo-markup into the correspondent TEI elements; addition of metadata to each text, concerning chronology, genre, transmission, authorship.

The resulting TEI dataset will be freely accessible through a user-friendly interface. The whole WoPoss project is an open science product, stored in an open GitHub repository.

HLBallard44 commented 4 years ago

Vierros, M. “Linguistic Annotation of the Digital Papyrological Corpus: Sematia.” This article focuses on a developing digital papyrological corpus called Sematia and its selected approach.

Corpus Design

Corpora in historical linguistics is usually concerned with how languages have developed and evolved through time. Sematia will be open-ended and include a corpus of Greek used in documentary papyri for a period of about a thousand years.
The users of Semata will be able to decide what they want to annotate or include in their searches. This way they can contribute to the corpus as a whole and researchers can select their own subcorpus and perform queries which makes repeating the research to obtain a consistent result easier.

How to Annotate Papyri

Sematia will provide the "basic" level of annotation in the hopes that the whole corpus will eventually be annotated. Current annotation includes morphological and syntactic annotation using dependency treebanks on Arethusa which has an API integration in Sematia.
Utilizes the Ancient Greek and Latin Dependency Treebank and the PROIEL treebank for Dependency Grammar. Annotation through Arethusa includes tokenization and then automatic lemmatization and morphological tagging which has to be checked and corrected by a human annotator.
Sematia creates two annotated parallel layers of the same text which allows researchers to study one version which has been preserved (the original layer) or to compare the preserved text with the standardized version possibly creating a third layer called variation.

Metadata and Its Purpose

Date and place of origin of papyri automatically put into Sematia from the PN (Papyrological Navigator) metadata field. Soon, PN metadata will include aspects of handwriting and writers vs authors to be able to identify writings by the same hand, to study idiolects, and to compare some writers to others.

The goals for Sematia are to have the whole papyrological corpus available, phonological searches, and an automatic morphological parser for Greek.

nicolealexandra33 commented 4 years ago

It would be interesting to see how Sematia does with the PGM especially as multiple languages are used (although as I understand it, Latin and Greek are currently the ones that the software can process). It could maybe help determine which spells were also translated or taken from another linguistic tradition despite the entry being in a different language

chiaradimaio commented 4 years ago

With regard to the background of the authors, it is worth saying that Francesca dell'Oro is currently teaching linguistics, but has a background as a classical philologist, while Marja Vierros is a classical philologist and a papyrologist. Both of them are mainly interested in the historical and developmental aspect of ancient languages. Their tools are highly customized and specialised, but they have the same target: offering to specialists reliable corpora that enable visualizing information about the diachrony of certain linguistic phenomena.

SunoikisisDC / SunoikisisDC-2020-2021

Discussion of Dell’Oro 2020 & Vierros 2018 #13