Tagging Rationale - Githubissues

ETCBC / genre_synvar

Genre labels in Text-Fabric format from syntactic variation project

MIT License

2 stars 0 forks source link

Tagging Rationale #1

Closed jcuenod closed 4 years ago

jcuenod commented 4 years ago

Could you add some of the logic guiding the tagging decisions to the readme? I assume it's somewhere in Martin's thesis but it seems like a good idea to keep that close to the data.

Also, it would be useful to list the genres that are tagged. I believe that it's:

instruction
list
poetry
prophetic
prose

codykingham commented 4 years ago

That would be really great. I'm not privy to the exact criteria used. Maybe @MartijnNaaijer or @PeursenWTvan could add something here that we could append to the documentation. I think the labels were built manually.

jcuenod commented 4 years ago

My impression was that it was manual tagging but that makes it more important in my mind to understand what was guiding the taggers. If you don't have the criteria, that's another story...

codykingham commented 4 years ago

Added tagging values to readme in https://github.com/ETCBC/genre_synvar/commit/c383395ac56f2596a78bd512ae3fa82c3309ac48

codykingham commented 4 years ago

My impression is that the majority of cases are uncontroversial, given a coarse-grained approach. But there are probably a few spots (e.g. Isaiah 6) where I see a lot of room for interpretation. Would be great to get the criteria used.

jcuenod commented 4 years ago

I'm closing, assuming that this it's going to be difficult to get the criteria.

codykingham commented 4 years ago

Hi James, let’s leave it open for now and see if Martijn or Wido has something to add. It’s an important question, and not too hard to at least get a basic answer!

MartijnNaaijer commented 4 years ago

Hi guys, we use three basic categories: prose, poetry and prophecy for complete books, so Genesis is prose, Psalms is poetry, Isaiah is prophecy, etc. This is not based on formal criteria, but on what you find generally in exegetical books. Within prose, we added lists, these are generally longs lists of names of people, and instruction, which is a general name for (mainly) law texts, genreally based on what one finds in exegetical commentaries. Also, we changed "prose" to "poetry" if there is a piece of poetry emmbedded in prose, such as in Genesis 49. So, the whole dataset is pretty basic and often based on intuitive criteria. It was meant to be coarse grained, to avoid drowning in tiny details, but feel free to improve!

jcuenod commented 4 years ago

Thanks!

codykingham commented 4 years ago

Thanks @MartijnNaaijer. I'll add this information to the readme before closing.

codykingham commented 4 years ago

I've added a methodology description to the readme in https://github.com/ETCBC/genre_synvar/commit/5ca08512c2d5070f189ed7485a93c12ad4daacd4 thank you both for improving the dataset!