CopticScriptorium / corpora

Public repository for Coptic SCRIPTORIUM Corpora Releases
31 stars 13 forks source link

Circumstantial in ap.10.monbeg #54

Closed rillian closed 4 years ago

rillian commented 4 years ago

First, thank you for your work maintaining the corpus here. It's been a very helpful resource!

I had a question about the tagging in Apophthegmata Patrum Sahidic 10. In the line

ⲡ̇ⲥⲱⲙⲁ ⲉⲧϣⲟⲩⲱ̇ⲟⲩ ⲛ̇ⲧⲉⲡ̇ⲙⲟⲛⲁⲭⲟⲥ ⲉϥ̇ⲥⲱⲕ ⲛ̇ⲧⲉⲯⲩⲭⲏ ⲉϩⲣⲁⲓ̇ ϩⲛ̇ⲛ̇ϣⲓ̇ⲕ ⲛ̇ⲧⲉⲡⲉⲥⲏⲧ

the ⲉϥⲥⲱⲕ ⲛ̄ⲧⲉⲯⲩⲭⲏ is analysed as circumstantial. I'd understood circumstantial clauses were necessarily subordinate, but I don't see any main clause here, so I thought it must be focalising, emphasizing that "The dried-up body of the (fasting) monk draws the soul up from the depths."

The metadata says the segmentation has been checked, so I wanted to ask if this is an oversight or if there's some grammar I don't understand. There's certainly been some confusion on this point among my student group.

amir-zeldes commented 4 years ago

Thanks for reporting this, you're right - I'll make sure it's corrected in the next release. There's actually another error in that sentence - the 'et' is segmented wrong, but I see someone has already caught that one, as it's corrected in our source files. In general we have more data than we can realistically go over to get it all perfect, so if you ever notice something wrong you are definitely helping by letting us know.

I should add that even files with segmentation="checked" aren't 100% reliable, though much better than automatic ones. The most reliable files, which have been gone over meticulously, say segmentation="gold" and I'd be more surprised to see errors there (though it happens!).

ctschroeder commented 4 years ago

I wanted to chime in and say thank you, as well. Also, if you are interested in working on any documents-editing documents already published or new documents-please let us know.

Also you can search in ANNIS for the metadata Amir mentions. Segmentation, tagging, and parsing all have fields registering automatic, checked, or gold to indicate the level of human correction. Here for example is a search for all circumstantial, vocalizing, and relative converters in AP with gold level tagging. https://corpling.uis.georgetown.edu/annis/?id=24ee9372-7896-4b69-bb88-539afd77850f

Last, these three converters (especially focalizing and circumstantial) are some of the most common human or machine errors (or disagreements) generally, in our experience.

rillian commented 4 years ago

Thanks for clarifying and for the further encouragement. I hadn't appreciated what the various transcription levels meant. I'll keep a closer eye on the corpus edition of things we're reading!

ctschroeder commented 4 years ago

Great @rillian. I am going to close this issue. Please be in touch if you have other questions or corrections.

rillian commented 4 years ago

Confirmed fixed with the v4.0.0 release.