SASDigitalHumanitiesTraining / TextEncoding

Text Encoding for Ancient and Modern Literature, Languages and History
9 stars 5 forks source link

Discussion of Recogito annotation exercise #4

Closed gabrielbodard closed 3 years ago

gabrielbodard commented 3 years ago

Please use this thread to discuss your decisions, experiences and any issues or questions that occurred to you concerning the Recogito annotation exercise. You may like to think in particular, but not exclusively about the following questions:

  1. Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text?
  2. What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate?
  3. Did you learn anything from visualising the data—either the process or the result—that you would not have learned from studying the original document in a conventional way?
  4. A different person will annotate these documents differently, perhaps because of different interests, knowledge of context, or approach to annotation. Does that matter?
  5. How much did the automated NER help/hinder the process of annotating your text? Were there things you wanted to annotate that Recogito did not enable?
gabrielbodard commented 3 years ago

A reminder, the three documents you want to work with, and the groups assigned to them, are as follows:

  1. Document 1: Anabasis RAW (Groups 1, 4, 7, 10)
  2. Document 2: Anabasis NER (Groups 2, 5, 8, 11)
  3. Document 3: Al-Idrisi’s Tabula Rogeriana (Groups 3, 6, 9)
hannah-sonbol commented 3 years ago

Here are some of my thoughts on the questions:

  1. Benefits of Visualization: Visualising in this format of course changes the interpretation, because it makes it more accecible in a different way. It also helps creating statistics.

  2. Difference of author As a non-European and hence a non-knowledgable person on Classical studies (Italy / Greece / Turkey), I felt a bit lost in tagging all these Greek places. It needed research and it plainly shows why some things should be done by specialists (-> Wikipedia is limited and in the end of the day non-scientific). Visualisation-manipulation I also thought it was a bit disturbing that you could not mark whole countries in contrast to small "cities" (although: to what boarders?). For example "Egypt" was on the visualized graphic as pinpointed as "Milet".

gabrielbodard commented 3 years ago

@hannah-sonbol You make a good point about the specialist knowledge required, but I was assuming most people would not be classicists, and would have to get a sense of correct identification of places from looking at the little map visualisations and variant spellings. Also once you visualise the annotation in the map view in Recogito, any strange outliers may be a clue that something is misidentified.

Re borders: the gazetteers that we're using for historical places usually don't draw borders per se—they're not always appropriate in the way they are for modern countries. What you saw for Egypt for example was probably an "indicative centre point" of an ancient province without clear (or known) borders. If you normalise to Egypt or Greece in the Geonames gazetteer, for example, it would give you the borders of the modern countries, in both cases incorrectly for the text. (Some provinces may have borders, e.g. within Asia Minor.)

bgarnand commented 3 years ago

Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text? I very much appreciated the simplicity, but would have liked more control, e.g. I have a series of inscriptions that I would like to map (similar to the Pompei example) with N-E coordinates, with some not yet on Pelagios (nor DARE, Pleiades, TPPlace, TM Geo)

What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate? I have a digital project with 6 travel narratives digitized (more to come) where a simple place-people-event markup would cover most of my needs. It’s geared toward undergraduates, and I would want to link just those 6+ specific texts to each other (shared glossary/gazetteer) as much as to outside gazetteers, following best practices / shared standards.

How much did the automated NER help/hinder the process of annotating your text? I felt that there were enough skipped terms (Chersonese, Abydos, Colossae), half-skipped terrms (marking only half for Marsyas and Maeander), mis-identified terms (Ionian Islands vs Ionia), and hard-to-correct entries (Sardis stubbornly stays in the US, for some reason), that doing it without NER would be more prodctive (keeping the global replace option). Also why not adjectives (Peloponnesian, Megarian, Stymphalian etc)?

Were there things you wanted to annotate that Recogito did not enable? could use a 'terms' field where key concepts could be defined/translated or 'conversion' for weights and measures

sergiobassocina commented 3 years ago
  1. "Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text?" No.

  2. "What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate?" It would be great to link places on the map to Arabic historical books (portolans, chronicles). I would like to create subsets (“Rogito, please highlight all the cities on Tabula Rogeriana that appear in Abu'l-Faraj ibn al-Jawzi’s chronicles)

  3. "Did you learn anything from visualising the data—either the process or the result—that you would not have learned from studying the original document in a conventional way?" Yes. To me, the asset is the network with other scholars, that are tagging the same document from the viewpoint of their own discipline. It is an amazing way of studying the same object with different perspectives in the same workspace – albeit virtually.

  4. "A different person will annotate these documents differently, perhaps because of different interests, knowledge of context, or approach to annotation. Does that matter?" Yes and it is a plus (see #3). Plus, when two tagging come into conflict, one can always disambiguate. Who wins? Is there a hierarchy in the annotating process? A moderator?

  5. "How much did the automated NER help/hinder the process of annotating your text?" It helped a lot, although I was wondering if referring to a gazetteer instead of another will hinder a future reference. Example: Ravenna before a certain date was Classis. I could legitimately tag Ravenna as Classis. However, the city later expanded beyond the harbor, and the name of the mainland settlement took over. So “Classis” is representative, but is not complete. The Pleiades gazetteer, for the sake of the Tabula Rogeriana, is chronologically wrong – maybe?

  6. "Were there things you wanted to annotate that Recogito did not enable?" It’s too soon to say. I would intuitively reason all the way round: a) Recogito is a tool for certain purposes, if I have those purposes, I use Recogito. If I have other purposes, b) are there alternatives around? B) is Recogito “malleable”? Can one invent new tagging or is it a closed system?

aghague commented 3 years ago

1. Did you find the simplicity of the tags-free interface in Recogito contrasted with the transparency of adding Markdown tags directly in text?

I'm not sure I would describe the differences as "contrasting", since I found using Markdown extremely easy to do. I'd say it's a bit more convenient when the gazetteers include the information one needs, but also a lot more fiddly when one encounters a gazetteer gap.

2. What do you want to do with the annotations? In what format? For what audience? How does that change the decisions you made about what/how to annotate?

In my own research, I am mainly using these kinds of annotations in two ways: to trace epistolary networks, and, respectively, to map geographical references in specific literary texts. My annotations focus on disambiguating place / person names and, often, on adding historical context; subsequent analyses focus on the frequency with witch specific locations or people are mentioned, on better understanding the distance between these locations, and on mapping the movement of ideas and people between them.

3. Did you learn anything from visualising the data—either the process or the result—that you would not have learned from studying the original document in a conventional way?

I find map visualisations very useful for conveying both proximity and distance easily and clearly. They are also very useful for conveying the scale on which movement occurs.

4. A different person will annotate these documents differently, perhaps because of different interests, knowledge of context, or approach to annotation. Does that matter?

Like many things, this can both help and hinder. Informed annotations from a variety of perspectives can deepen understanding, but they can also be distracting (i.e., they may make it hard to focus on the specific issues one is trying to investigate).

5. How much did the automated NER help/hinder the process of annotating your text?

When it gets the category/reference right, it is very useful. When it gets it wrong, it can be time consuming to correct: see, for example, the misreading of the adjective "Greek" in the phrase "Greek hoplites" as a reference to a building identified as "Greek Theatre, Villa Adriana"; since I did not seem to be able to delete the reference, I had to do a bit of research to identify the specific Greek area or city-state from which those particular hoplites originated so I could link to it. It is also inconsistent: sometimes it would include the phrase "of [a specific location]" in an individual's name, and sometimes not; this can also be time-consuming to amend.

6. Were there things you wanted to annotate that Recogito did not enable?

Nothing else springs to mind at the moment.

gabrielbodard commented 3 years ago

As promised, I have made a small visualisation in Google Maps of the three sets of annotations that you created in the Recogito exercise. You can see on the left the key (different icons for the al-Idrisi map, and the hand-annotated and NER-annotated Anabasis; green for verified, red/pink for unverified place identifications), and you can also turn off and on the individual layers, to make it easier to see the contrast between them.

What do you notice? Apart from the fact that only six places were annotated on the al-Idrisi map, what are the main differences between the hand- and NER-annotated places in Xenophon? Do you see any obvious errors on the map?

How does any of this impact on the conversation we were having about annotation agendas and methods, above?

abigaillloyd commented 3 years ago

Hi Gabriel,

As promised, I am posting the query I emailed about. In the talk - I think it was Recogito video tutorial from Sunoikisis Digital Classics (which I watched from the very beginning to the end) - it was mentioned that the Swedish Government(?) or a Swedish institution were interested in using Recogito for research work on medieval settlement names. I am currently working with the English Place-Name Society, and also doing a PhD involving place-names and medieval settlement in Britain. This was one of the reasons I attended your course, to see what was available and useful for working with extracting and analysing place-names from texts. My initial thoughts were (having looked at and tried out Recogito) that much of the data currently reflects a Classical Aegean-focussed world, but your mention of the Swedish work is much closer to home in terms of geography and historical period. Much of the work I do involves medieval Scandinavian comparators. I just wondered how advanced the Swedish work was and who was involved in it? It would be great to find out more info, if at all possible.

If you were able to bring this to the attention of anyone involved with Recogito who might be able to help, that would be great. Thanks so much, Abigail

sergiobassocina commented 3 years ago

Thank you @gabrielbodard ! Utility of Google map: I could visualise at first sight the oddity of such sites as Villa Adriana, the Romanian site and the Southern Nile site for Anabasis (maybe the second and the third site are actually cited by Xen). It is weird that the system didn't allow to delete Villa Adriana, for example. Visualisation can be a disambiguating method, though. Can we arrange these diatopic markers in a diachronic order, so as to create a timeline of events?

valeriavitale commented 3 years ago

@abigaillloyd:

As promised, I am posting the query I emailed about. In the talk - I think it was Recogito video tutorial from Sunoikisis Digital Classics (which I watched from the very beginning to the end) - it was mentioned that the Swedish Government(?) or a Swedish institution were interested in using Recogito for research work on medieval settlement names. I am currently working with the English Place-Name Society, and also doing a PhD involving place-names and medieval settlement in Britain. This was one of the reasons I attended your course, to see what was available and useful for working with extracting and analysing place-names from texts. My initial thoughts were (having looked at and tried out Recogito) that much of the data currently reflects a Classical Aegean-focussed world, but your mention of the Swedish work is much closer to home in terms of geography and historical period. Much of the work I do involves medieval Scandinavian comparators. I just wondered how advanced the Swedish work was and who was involved in it? It would be great to find out more info, if at all possible.

If you were able to bring this to the attention of anyone involved with Recogito who might be able to help, that would be great. Thanks so much, Abigail

Hello Abigail, Glad to see you're interested in using Recogito. I think that the project you may be referring to is Norse World. I am not sure if they are actually using Recogito, but it may be useful to get in touch with them e get an update on their work on Swedish historical places. You should find all contact information on the website. We may have also mentioned Umea university, where they have installed their local instance of Recogito, customising it with gazetteer data specific to their project on Athens and Pausanias.

Recogito can be used (and has been used) as a first step to extract place names from historical sources, and create a new gazetteer, leveraging the CSV download format. If you want to talk more about it, feel free to get in touch directly.

Your work at the English Place-Name society sounds really interesting, I'd love to know more about it. Are you in contact with Gough Map project?

Best, Valeria

abigaillloyd commented 3 years ago

Dear Valeria and Gabby,

Thanks so much. That is really helpful. I will discuss all of this with the Nottingham people currently behind the EPNS survey (which has been running in non-digital format for nearly 100 years but is looking to be a bit more digital and spatially, visually mapped). Yes, I do know some of the people involved with the Gough Project - not very well, but their sessions and seminars pop up quite a lot here in Oxford, so I have attended them when I can.

Re the Swedish reference - I have gone back and looked again at the video - it is the Sunoikisis DC Fall 2019 video at 93 minutes in (pretty much at the end - last thing discussed) when Johan Ahlfeldt says that the National Archives of Sweden are interested in using Recogito for their gazetteer of medieval settlement names. Do you or the other presenters happen to have any more info on that or the details of anyone I could get in contact with about it?

Thanks again so much and I may well be back in touch in the future! All the best, Abigail

On Mon, 7 Dec 2020 at 15:31, valeriavitale notifications@github.com wrote:

Hi Gabriel,

As promised, I am posting the query I emailed about. In the talk - I think it was Recogito video tutorial from Sunoikisis Digital Classics (which I watched from the very beginning to the end) - it was mentioned that the Swedish Government(?) or a Swedish institution were interested in using Recogito for research work on medieval settlement names. I am currently working with the English Place-Name Society, and also doing a PhD involving place-names and medieval settlement in Britain. This was one of the reasons I attended your course, to see what was available and useful for working with extracting and analysing place-names from texts. My initial thoughts were (having looked at and tried out Recogito) that much of the data currently reflects a Classical Aegean-focussed world, but your mention of the Swedish work is much closer to home in terms of geography and historical period. Much of the work I do involves medieval Scandinavian comparators. I just wondered how advanced the Swedish work was and who was involved in it? It would be great to find out more info, if at all possible.

If you were able to bring this to the attention of anyone involved with Recogito who might be able to help, that would be great. Thanks so much, Abigail

Hello Abigail, Glad to see you're interested in using Recogito. I think that the project you may be referring to is Norse World https://www.uu.se/en/research/infrastructure/norseworld/. I am not sure if they are actually using Recogito, but it may be useful to get in touch with them e get an update on their work on Swedish historical places. You should find all contact information on the website. We may have also mentioned Umea university, where they have installed their local instance of Recogito, customising it with gazetteer data specific to their project on Athens and Pausanias.

Recogito can be used (and has been used) as a first step to extract place names from historical sources, and create a new gazetteer, leveraging the CSV download format. If you want to talk more about it, feel free to get in touch directly.

Your work at the English Place-Name society sounds really interesting, I'd love to know more about it. Are you in contact with Gough Map project?

Best, Valeria

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SASDigitalHumanitiesTraining/TextEncoding/issues/4#issuecomment-739991225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR6ZE5YJNX4HOXH7WYTYD33STTYMJANCNFSM4T47XU5A .

-- Sender: Abigail Lloyd