Closed gabrielbodard closed 3 years ago
The article by Stover and Kestemont focuses on the complex attribution of the Historia Augusta (HA), a late historiographical Latin work that covers the period from the reign of Hadrian to the one of Numerian (i.e. 117-284 AD). This collection of biographies is an essential source for some historical periods with a lack of testimonies, but it often shows incoherence and anachronisms that made scholars doubt its authenticity. Manuscript tradition claims the oeuvre has six different authors: therefore, it proves an interesting case study in stylometry. In 1889, Dessau first proposed the attribution to one author only, presumably writing in the times of Theodosius; many others (including Syme) approved his position. A short summary of the textual condition of the text is necessary to understand the article. HA is characterized by a lacuna (covering the period between Gordian III and Valerian), that corresponds to a natural division of the text: the lives before the lacuna were assigned to four of the six authors, while the other ones to the other two. This division also corresponds to a change in reliability of historiographical information: the second part, according to Mommsen, corresponds to the Nebenviten (secondary lives), which are full of fictional, unreliable facts. The first ones were the so-called Hauptviten, believed to be based on the work of Marius Maximus (or of an Ignotus, as Syme claims). From the late 70’s, analytical studies have helped solving the problem of the single \ multiple authorship. It must be said that a computational approach has not always shown the expected results, also because in its earliest times it was marred by methodological errors, such as using the sentence length as a criterion of authorship. Digital scholars have tried using the General Imposters (GI) framework, which yet shows how difficult is to compare documents addressing the same topic (because this could lead to an increase in stylistic similarity). The process is achieved by splitting HA in strings of 1000 words, removing all punctuation, and then letting the programme compare it with the candidate authors for which one has got reference material (authorship verification). PCA (Principal component analysis) can help better visualizing the authorial layers, towards the creation of a scatterplot. It does not require the insertion of author tags in the machine; therefore it produces a clearer, non-influenced by previous studies and opinions, picture. The results of this research are: • the corpus displays two different authorial signals: the ones before the lacuna can be distinguished from each other, whilst the second two are only marginally different from each other. • the lacuna corresponds to a stylistic break • the structure of the 6 authors presented by the manuscripts is not acceptable, on the contrary certain stylistic tics point towards a single authorship • it is likely that a single author incorporated an earlier source, making few changes in the lives of senior emperors and adding material new to the junior ones. Cameron observed that ‘the quantitative method is unlikely to give definitive results’. But the striking outcome is that there is no conflict between the results of the traditional and the digital research on this subject.
My general understanding of these two pairs of authors is that generally, one in each pairing is more formally trained in computational sciences and digital humanities, while their counterparts are humanities researchers, trained in a more ‘traditional’ sense: Kestermont is a digital classist who is experienced with coding, while Stover focuses more on classical texts in the more traditional sense and philosophy. Meanwhile, Marton Ribary’s research interests involve the comparative study of legal texts, particularly between Roman and Rabbinic traditions, while Barbara McGillivray is trained as a computational linguist. In such a way, it would seem as though the combinations of authors probably help to balance out projects and research aims and methods, as well as allow for a more holistic perspective. It is also clear from the pairs of authors as well as the questions they are attempting to answer that these methods allow for a greater intersection of disciplines: for example, Ribary and McGillavary also argue that their project represents an intersection between classics and law studies.
Marton Ribary & Barbara McGillivray. 2020. “A Corpus Approach to Roman Law Based on Justinian’s Digest.” Informatics 7, 44. Available: https://doi.org/10.3390/informatics7040044
In this article, the authors discuss the creation and purpose of a ‘relational database’ of Justinian’s Digest using Python. Having divided the Digest’s sections into groups according to their respective linguistic profiles, the authors were able to draw the conclusion that training for lawyers was intended for practical purposes, rather than abstract and philosophical ones.
With regards to methodology, Ribary and McGillivray used a corpus approach to analyse and display the features of Roman law vocabulary in Justinian’s Digest. Firstly, they adjust the text so that it can be read by a computer by using a ‘relational database’ drawing from ROMTEXT. Secondly, they apply computational hierarchical clustering to words, in order to categorise them into groups, according to the semantic they pertain to (e.g. a cluster could be all words thematically relating to the offense of theft). Words in these categories are added their lemmatised form. Then, they remove high-occurrence words that do not belong to any of the thematised clusters (for example, connecting words, such as conjunctions or frequently used verbs, such as sum). The clustering method is used to reveal an empirical structure to Roman law, as opposed to the idea that law training was philosophical and abstract. Finally, Ribary and McGillivray use fastText to identify ‘semantic splits’ in the vocabulary of the Digest, that is, to find, firstly, words with a particular meaning in a general context and a different one in a legal one, and secondly, a way to identify these splits using electronic methods to better understand how language is transformed into technical (in this case, legal) terminology.
N.B: I chose to occasionally quote direct formulations from the article because using synonyms in specialised language could have rendered wrong meanings.
A little extra background on the authors of these papers.
Justin A. Stover is currently a professor at All Souls College, Oxford. He specialises in Classical tradition, Platonism, Humanism, and Medieval Latin. His publications concern textual reconstruction and Platonic tradition.
Mike Kestermont is an academic researcher at the University of Antwerp, Belgium. He has his own website highlighting his interests in Digital Humanities and "how digital methods and computation can support and enhance traditional forms of research and teaching." His code language speciality is Python and his current interests are with machine learning. (He also has an inspirational quotes section on his website, neat.)
As for the authors of “A Corpus Approach to Roman Law Based on Justinian’s Digest,” Marton Ribary is currently a research fellow at the University of Surrey. He has an MA in Library and Information Studies, an MA in Philosophy with Classics and Hebrew, and a PhD in Middle Eastern Studies. He has also studied AI and Law at the European University Institute and Digital Humanities at Oxford. Barbara McGillivray has a degree in Mathematics and Classics and a PhD in Computational Linguistics. She is interested in data science and digital humanities, Latin and Ancient Greek linguistics, corpus linguistics, computational and historical linguistics, and natural language processing for humanities research.
This is just basic background information. Nicole has compared the two sets of authors from each article and discussed their similarities and differences.
@nicolealexandra33:
My general understanding of these two pairs of authors is that generally, one in each pairing is more formally trained in computational sciences and digital humanities, while their counterparts are humanities researchers, trained in a more ‘traditional’ sense
This is a good point. Slightly more true for Kestermont-Stover, I think (both McGillivray and Ribary have some technical skills and philological training as well, although the general pattern is as you describe it).
What does this say about the value of collaboration and the skills/skillsets needed for this kind of research?
One interesting aspect of the articles we are now analyzing could be the fact that both of them focus on late Latin texts, characterized by a stratification of the material. This makes me think that sometimes the computational approach could be preferrable for those scholars who have to deal with complex challenges in the organization and the trasmission of the texts they are studying. When it comes to 'canonical', largely studied authors, the debate seems more concerned with the semantic of their language, that is most of the times peculiar: evaluating their worlds (especially poetic ones) in terms of 'quantity' could not provide with satisfactory results, besides the statistic aspect. Late latin is much more sclerotized in its structure and based on codified models: this could be a perk when it comes to making the text machine-readable.
I hope I was able to expxress my point clearly. I think everybody has already pointed out well the background of the scholars themselves and how this could affect the outcomes of their research.
What does this say about the value of collaboration and the skills/skillsets needed for this kind of research?
Two minds are better than one. When doing research that requires someone to be knowledgable in both ancient texts, or philosophy, or Roman law, etc. and computational skills; it may be prudent to have more than one researcher. A Classicist may have a basic understanding of Digital Humanities/Classics but collaborating with a Digital Classicist or Computational Linguistic will allow them to further their researcher in a way they may not have been able to by themselves. In the case of McGillivray and Ribary, they both have knowledge in digital humanities, but Ribary has also been educated in law and middle eastern studies while McGillivray has studied classics and computational linguistics. Both have skills that compliment each other and aid their research.
To add to everyone’s comments on the benefits of collaborative research partnerships such as the two pairs discussed here, it struck me that some of the difficulties surrounding the introduction of digital methods to Classical scholarship that previous weeks’ reading material has highlighted, find their resolution here. For example, (Rosselli – week 8 I think) discussed the problem of digital research methods neither being taken full advantage of by traditional scholars nor always given appropriate recognition, due to a kind of lingering scepticism gap within the discipline, a schism which seems to be very effectively bridged in these two articles where the project is not ‘factional’ but truly collaborative if this makes sense. This obviously then allows more fruitful and engaged results.
Useful comments on collaborative work. It's worth pointing out that a lot of the projects we have seen (in this session and others) have been joint endeavours, with at least one participant having more hardcore technical skills and computational grounding, and at least one having more profound philological background. This is how I have almost always worked as well (although I fall somewhere between the two extremes, and sometimes have worked as the bridge between two less polymath scholars). There are also people who are able to span both camps, as we have seen, who may have doctorates in "Digital Humanities" and both philological and computational expertise. It would be interesting to unpick some of the differences between these approaches.
As always, think about questions including the provenance of the paper (author, venue, context, etc.), the research background, disciplinary questions, the bigger picture, etc. Also if you have any questions or anything isn't clear, please ask below.