dracor-org / dracor-schema

ODD and schemas for dracor.org files
https://dracor.org/doc/odd
5 stars 2 forks source link

Document How/Why to assign Genre to a play #74

Open ingoboerner opened 3 weeks ago

ingoboerner commented 3 weeks ago

This is more a diffuse bunch of questions that an actual TODO:

I will put a section on encoding genre to the new upcoming ODD version, see #67 As it turns out I don't fully understand our encoding of genre. I would also be very interested on the genesis of this encoding strategy (I remember we had an e-mail conversation and there was something at a hackathon, see #3)

Anyways, we have a somewhat two-fold encoding of genre: A "human-readable label" in the <term> in <keywords> AND an external identifier (Wikidata QID) in <classCode>.

One example that I will put in the ODD:

<textClass>
  <keywords>
    <term type="genreTitle">Tragedy</term>
    </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q80930</classCode>
</textClass>

The draft for the section:

2.3.3.2. Genre Within in the the genre of the drama can be specified.

The content of the element in the section is considered the human-readable lable [LABEL!] for the classification contained in . The value should be the QID of the respective concept on Wikidata. The following values are suppored [SUPPORTED] by the API:

lehkost commented 3 weeks ago

That's a tough one. The backdrop is that we wanted to do studies like this, where we needed coarse genre info, basically comedy or tragedy (or none of the two). After discussing this for quite some time, we followed Dario's advise that you also pointed to above.

However, I'm also not sure if we distinguish between Wikidata identifier for genre or the actual wording within <term>. Either way, this info is extracted and served via the API, e.g. in the corpus metadata files.

For FreDraCor, we even did some mapping of more distinguished genres down to flattened genre info, see here.

I remember that since the beginning we just wanted to encode the information in the subtitle and would abstain from putting a genre on a play if it's not clear from the subtitle. But there are is also the satirical use of genre information, as in this play, which goes by "Tragedy", but clearly is a satire (when encoding, I left the genre info empty on this file). So one answer would be, it encoding genre info is the responsibility of the encoder(s).

Oh, regarding this other question, I don't think we can assign two different genres at the moment (at least we don't use this info, I think).

Maybe also good to involve @peertrilcke in this discussion.

ingoboerner commented 3 weeks ago

Thanks for that input: In the code I found the the supported values:

"Q40831": "Comedy", "Q80930": "Tragedy", "Q192881": "Tragicomedy", "Q1050848": "Satyr play", "Q131084": "Libretto"

Nothing that we should implement right away but just for the record: It could be an option to allow for means to mark the part of the source that serves as evidence for assigning the genre, e.g. in the <title type="sub">Ein Trauerspiel in fünf Akten</title> use <rs> with some designated type to mark up the term "Trauerspiel" and the refer to that from the genre classification. Then we could automatically check if there is evidence in a play to support the genre information.

cmil commented 3 weeks ago

From inside textClass the API only uses the classCode element, and there it ignores any code the is not among the supported ones @ingoboerner mentioned above. So we use a fixed set of values and associated labels to Wikidata IDs by configuration.

Technically there can be multiple classCodes. We use this to both assign a genre and whether or not a play is also a libretto. For this to work we follow the convention that if the libretto classCode is applies, it must be the first in the list of classCodes, otherwise the first classCode is interpreted as a genre designation. See https://github.com/dracor-org/dracor-api/blob/9df15f727a210b9ff80bf32aaf3523926b3e8c8e/modules/util.xqm#L543-L546.

So textClass technically is not restricted to genre information and could be extended in the future. We could perhaps also get rid of the restriction that the libretto has to be the first classCode to be recognized bz the API.