ChiaraPalladino / furesearch

Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

Xenophon Ten Thousand - Week 3 #20

Closed CianColgan closed 3 months ago

CianColgan commented 4 months ago

Just letting you know I finished my third-pass of Books 2, 3, and 4 with fixing annotations, so they are open for you to review. I put my personal list of the specific edge-case annotations that I needed to look over in the Xenophon folder. I also attached additional reasoning to some of them so that it's on record and can be looked at while you review.

I mainly opened this issue because I was hoping to begin cleaning data at the end of this week, and you'd mentioned you might have specific tools or techniques that would be helpful? Like I said, Books 2 and 3 data are up so you can get an idea of what it looks like and what cleaning would be best.

ChiaraPalladino commented 4 months ago

Hi Cian, I'll go through your annotations this week and will try some preprocessing with Book 2 to see to what extent an automatic approach works. I'll circle back to you in a couple of days.

ChiaraPalladino commented 4 months ago

Update: I have done a very rough cleanup of Book 2 and uploaded the export data in the repository. I didn't use a script to clean it but a very simple approach to separate the tags - the downside, however, is that some of the tags ended up in the wrong field, so this must be corrected manually.

How to correct the output manually

I still want to prepare an ad hoc script to clean these files and get a better output, but since Book 2 is quite straightforward I thought I'd give you something to do.

CianColgan commented 4 months ago

Revised the first rough cleanup and attached the file to this comment. I think I understood the system. I can't figure out why some of the links didn't transfer at all while the majority did, but I fixed it regardless.

I didn't do it in the cleanup, but one thing I think I might change is adding an "ambiguous: true/false" (and maybe the same format for casestudy too) column right after the "Visited or referenced" one, since I know those two tags will compete for space in the subsequent books.

xenophonBook2_revisedFirstCleanup.csv

ChiaraPalladino commented 4 months ago

I have uploaded your revised file here.

Still didn't get a chance to work on the script but hopefully i'll be able to finish it by tonight (EU time)

ChiaraPalladino commented 4 months ago

update: the script is ready and i have used it to clean the Book 3 and Book 4 files.

Everything is in the repo folder: https://github.com/ChiaraPalladino/furesearch/tree/main/ancient-geographies/xenophon-anabasis

You should still have a look at the new csv files to make sure that all the values are in the right place, and if you want to try executing the python script yourself you obviously can (I can tell you how or you can look it up, it's quite easy).

CianColgan commented 4 months ago

Progress Report:

Additionally: I had a little extra time on my hands so I went ahead and redid all my work for Book 1 of the Anabasis as well using the new annotation guidelines and bringing it up to date with the state of the other Books the project addresses; it's shared with you and available for review. I posted a (much shorter) list of notable annotations alongside the others. I also ran the .csv of the Book 1 data through the data cleaner, and am currently in process of manually revising the data. When it's done (likely early the week I get back), I will post it in the repo.

Up Next: Import all the data into ArcGIS Pro and begin mapping!