Closed jcuenod closed 4 years ago
That would be really great. I'm not privy to the exact criteria used. Maybe @MartijnNaaijer or @PeursenWTvan could add something here that we could append to the documentation. I think the labels were built manually.
My impression was that it was manual tagging but that makes it more important in my mind to understand what was guiding the taggers. If you don't have the criteria, that's another story...
Added tagging values to readme in https://github.com/ETCBC/genre_synvar/commit/c383395ac56f2596a78bd512ae3fa82c3309ac48
My impression is that the majority of cases are uncontroversial, given a coarse-grained approach. But there are probably a few spots (e.g. Isaiah 6) where I see a lot of room for interpretation. Would be great to get the criteria used.
I'm closing, assuming that this it's going to be difficult to get the criteria.
Hi James, let’s leave it open for now and see if Martijn or Wido has something to add. It’s an important question, and not too hard to at least get a basic answer!
Hi guys, we use three basic categories: prose, poetry and prophecy for complete books, so Genesis is prose, Psalms is poetry, Isaiah is prophecy, etc. This is not based on formal criteria, but on what you find generally in exegetical books. Within prose, we added lists, these are generally longs lists of names of people, and instruction, which is a general name for (mainly) law texts, genreally based on what one finds in exegetical commentaries. Also, we changed "prose" to "poetry" if there is a piece of poetry emmbedded in prose, such as in Genesis 49. So, the whole dataset is pretty basic and often based on intuitive criteria. It was meant to be coarse grained, to avoid drowning in tiny details, but feel free to improve!
Thanks!
Thanks @MartijnNaaijer. I'll add this information to the readme before closing.
I've added a methodology description to the readme in https://github.com/ETCBC/genre_synvar/commit/5ca08512c2d5070f189ed7485a93c12ad4daacd4 thank you both for improving the dataset!
Could you add some of the logic guiding the tagging decisions to the readme? I assume it's somewhere in Martin's thesis but it seems like a good idea to keep that close to the data.
Also, it would be useful to list the genres that are tagged. I believe that it's: