Discussion of Yousef 2019 & Pataridze 2018

gabrielbodard commented 3 years ago

Please discuss the following readings in this thread:

Tariq Yousef (2019), "Ugarit: Translation Alignment Visualization". LEVIA’19: Leipzig Symposium on Visualization in Applications 2019. Leipzig. Available: https://osf.io/thsp5.
Tamara Pataridze & Bastien Kindt (2018). "Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus." Journal of Data Mining and Digital Humanities. Available: https://jdmdh.episciences.org/4182/pdf

Kiamanx commented 3 years ago

Tariq Yousef (2019), "Ugarit: Translation Alignment Visualization". LEVIA’19: Leipzig Symposium on Visualization in Applications 2019. This article is an introduction to Ugarit, a browser based program that is used for dual translation alignment for parallel texts.

Intro A status quo is established regarding Bilingual text alignment. The introduction states that manual text alignment will produce more accurate results than machine alignment, but is far more costly and time consuming. Like many other programs of its kind, Ugarit was developed out of necessity to create a more 'user-friendly' interface with all the tools necessary to perform text alignment and break the skill barrier required for it.

The lack of user-friendly annotation tools for translation alignment and the need for accurate training data to perform the automatic alignment drove us to create Ugarit

Languages and development Ugarit was developed with three main languages in mind; Latin, Persian, and Ancient Greek. The reason for this was because "few to none aligned (translation) data sets exist" for these languages. The development of Ugarit thus began in 2017 at the University of Leipzig, supervised by Prof. Gregory Crane and collaboration with Chiara Palladino and Maryam Foradi.

Ugarit is crowd-sourcing project enables users to create translation alignments at word level, and the resulting translation pairs can be re-used in future machine or human translations or to create dynamic lexica and translation memory.

Workflow Ugarit was developed with users having no experience with translation alignment in mind. Any member of the public can log in and create translation alignments in an easy-to-understand way. Once users create their pairs, then they are stored within the site database and can be exported through XML. The alignments will be saved to the user's profile. To enhance the experience for a layman, the site has also been visualised as much as possible and users can select from a colour-coordinated tree of language subgroups at the site home.

Ugarit contains a huge amount of English-Ancient Greek automatic aligned texts at word level provided be (sic) Perseus Digital Library

Conclusion and the future Future versions plan to include roles for contributors (e.g. expert, instructor, student") that would "help create accurate training data" as experts and instructors would correct the work conducted by the students. Other ideas include classes, collaborative alignments, deadlines for assignments, saving in other formats and unique URLs for texts.

despinaborcea commented 3 years ago

Tamara Pataridze & Bastien Kindt (2018). "Text Alignment in Ancient Greek and Georgian: A Case-Study on the First Homily of Gregory of Nazianzus." Journal of Data Mining and Digital Humanities. Available: https://jdmdh.episciences.org/4182/pdf

In this article, the authors set out to analyse linguistic and translational aspects which arise from the alignment of lemmatised bitext of the Gregory of Naziansus’ Oratio I (written in Ancient Greek) and its translation in Ancient Georgian. Their work is part of the GREgORI Project.

Regarding the methodology, the authors use word alignment between the Oratio I, defined as ‘source-text’(ST) and its rendition in Georgian by Ephrem Mtsire, the ‘target-text’ (TT). In terms of corpora, they start by employing data already available via Thesaurus Sancti Gregorii Nazianzeni, which contains the lemmatised version of Gregory of Naziansus’ text. For the Georgian equivalent, they use data from thirteen homilies, with specific focus on Mtsire’s translation, because of its literal aspect. The words in the corpora are then tagged and aligned. While performing the linguistic alignment, it is observed that often Ephrem Mtsire tends to follow the word order of the Greek in Georgian, with some exceptions.

The result of the analysis is defined as a ‘bitext’, then uploaded on the GREgORI Project’s directory in order to create a bilingual dictionary and to aid the understanding of translation methods across different cultures. The authors justify the need for this particular alignment through the fact that traditional scholarly methods, outside the sphere of Digital Humanities, would not be able to extract the information as easily and correctly, given the fact that Ancient Greek and Ancient Georgian are very different languages, from linguistic roots to structure. Moreover, software tools for Georgian are very limited.

In conclusion, the authors set out further steps following this study - to annotate the remaining 12 homilies and once more stress the importance of this morpho-syntactical annotation as complementary to previous research in translation studies between Ancient Greek and Eastern Christian text.

nicolealexandra33 commented 3 years ago

It would be interesting to see how text alignments would work with poetry or songs rather than prose because direct translations aren't necessarily found as it would not fit the meter. When you have more metaphorical translations than direct ones, I wonder how that might change the lexicon or dictionary-- it might be worth adding in a separate category for these translations

LauraHead commented 3 years ago

Regarding the article on the Ugarit tool, I think the idea of potentially developing roles for content providers as has been mentioned – ‘instructor’, ‘student’ etc - is interesting in terms of the discussion of the reliability of Wikidata sources from previous weeks. The creator of the tool would be circumventing some of the problems of community editing and reliance on policing by consensus without preventing more amateur users from contributing and using the software – it would be interesting if this idea was extrapolated to other Wikidata projects?

HLBallard44 commented 3 years ago

Would there be any downfalls to developing different user roles in other crowdsourcing platforms? Going off of Laura's comment, I believe creating different user roles would help differentiate data input and provide greater accuracy for information. At this point, I cannot think of a downside to this idea, but I am curious about other's opinions?

despinaborcea commented 3 years ago

I agree with Hannah - I cannot see any general downsides, just two general ideas that I believe are applicable to any field of research. I am thinking of one of the points made about voluntary contributions on Wikipedia which would apply here too and concerns financial matters (this, of course, depends on the crowdsourcing platform): how could user roles be assigned and financially compensated fairly? It sometimes happens (again, not limited to Digital Humanities and Classics only), that a more junior user's contributions (e.g. student) are voluntary. Also in the sphere of Classics, I am thinking about the traditional scholars, not familiar with digital tools, which would make the group reluctant to crowdsourcing platforms and thus some of the best contributions could be left out, posing the question of how representative is the particular platform for a particular topic, if the experts in the field are not involved. This is obviously not the norm, but could be the case.

gabrielbodard commented 3 years ago

Who decides who is more vs less reliable as a contributor?
Do we have to eliminate anonymous/pseudonymous contributors for this to work?
What about people who don't fit into either category (teacher/student/academic/expert…)?
How do we flag edits by different categories of contributor for transparency for the reader?
Would lower status contributors be disincentivised from participating?

HLBallard44 commented 3 years ago

Continuing off Despina's point about voluntary contributions, crowdsourcing platforms rely on what is essentially deemed "free labor." This was also discussed in Perry & Beale's article "The Social Web and Archaeology’s Restructuring." Wikimedia, Recogito, and Ugarit depend on the community to use their computational technology to obtain more data. Scholars who are more knowledgeable in the field may provide some of the data, but unless their research corresponds directly with translation technology, they may not have time to input data into a platform such as Ugarit.

Kiamanx commented 3 years ago

While working on this I was sure my basic level Pahlavi would not yield 100% accurate results, which they didn't. While the vast majority of it (I believe) was accurate but I feel like something as access-friendly as this site needs... no... requires a peer review system from others like Wikipedia.

SunoikisisDC / SunoikisisDC-2020-2021

Discussion of Yousef 2019 & Pataridze 2018 #15