acl-org / acl-anthology

Data and software for building the ACL Anthology.
https://aclanthology.org
Apache License 2.0
410 stars 281 forks source link

Correction to Anthology ID 2022.digitam-1.3 #2234

Closed EVanElverdinghe closed 1 year ago

EVanElverdinghe commented 1 year ago

1: Metadata correction

2: Revision or erratum

This correction aims to resolve a mix-up that arose when the 2022 DigitAm workshop was submitted on ACLAnthology. The articles 1.3 and 1.4 had been incorrectly swapped and preliminary versions had been sent instead of the final versions as approved by the authors.

2022.digitam-1.3.pdf

EVanElverdinghe commented 1 year ago

Corrected abstract: The colophons of Armenian manuscripts constitute a large textual corpus spanning a millennium of written culture. These texts are highly diverse and rich in terms of linguistic variation. This poses a challenge to NLP tools, especially considering the fact that linguistic resources designed or suited for Armenian are still scarce. In this paper, we deal with a sub-corpus of colophons written to commemorate the rescue of a manuscript and dating from 1286 to ca. 1450, a thematic group distinguished by a particularly high concentration of words exhibiting linguistic variation. The text is processed (lemmatization, POS-tagging, and inflectional tagging) using the tools of the GRE_g_ORI Project and evaluated. Through a selection of examples, we show how variation is dealt with at each linguistic level (phonology, orthography, flexion, vocabulary, syntax). Complex variation, at the level of tokens or lemmata, is considered as well. The results of this work are used to enrich and refine the linguistic resources of the GRE_g_ORI project, which in turn benefits the processing of other texts.