dracor-org / dracor-api

eXistdb application for dracor.org
MIT License
10 stars 2 forks source link

Introduce "Fragment" as new possible value for <textClass> #177

Closed lehkost closed 5 months ago

lehkost commented 1 year ago

As a follow-up to https://github.com/dracor-org/dracor-api/issues/120, I'd like to suggest to introduce "Fragment" as new possible value for <textClass>. There are many unfinished plays that are part of the canon, like Goethe's "Die Aufgeregten", Hölderlin's "Empedokles" or Büchner's "Woyzeck" (the latter already being part of GerDraCor: ger000564). It would be very useful to be able to flag them as fragments and include this info in the metadata table as well (Fragment "true" or "false", just like the "Libretto" column). That way, fragments can be easily excluded from certain calculations etc.

This would be an obvious encoding suggestion:

<textClass>
  <keywords>
    <term type="genreTitle">Fragment</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q1440453</classCode>
</textClass>

Two things we should discuss:

  1. Q1440453 in Wikidata is both "genre or piece of a larger work". If I'm not mistaken, with plays it will always be "piece of a larger work" and not a "genre" (like in Novalis' text collection "Blüthenstaub" and other works of Early Romanticism). We could still use <term type="genreTitle">, but maybe there is a better way to describe a play as fragment?
  2. As far as I can see, there is no play yet marked-up with two genres, like "Libretto" AND "Comedy" in our collection. If we, say, additionally mark-up a "Comedy" as "Fragment", would we just add the new keyword within the same <textClass> element? I think it would be cleaner to add another <textClass> element, no? Otherwise it could be unclear which keyword the <classCode> value belongs to.
peertrilcke commented 1 year ago

"fragment" is, after all, a theoretically and historically heavily claimed term. I would refrain from using it here. What is meant is that the works are unfinished, isn't it? Is there no other way to represent this in TEI? To solve this with a "genreTitle" makes little sense from my point of view, you name the problems yourself. If a "comedy" has remained a "fragment", the attributes "comedy" and "fragment" are on very different levels (so "no" to No. 2 and "yes" to No. 1).

TEI knows in revisionDesc status="unfinished" - but this refers, as far as I see, rather to the TEI file and not to the work represented in it. Surely it would be better somewhere in sourceDesc? Maybe there is a solution there? I can't find anything on the fly, but maybe someone else has an idea?

cmil commented 7 months ago

Two things we should discuss:

  1. Q1440453 in Wikidata is both "genre or piece of a larger work". If I'm not mistaken, with plays it will always be "piece of a larger work" and not a "genre" (like in Novalis' text collection "Blüthenstaub" and other works of Early Romanticism). We could still use <term type="genreTitle">, but maybe there is a better way to describe a play as fragment?

The API currently completely ignores the keywords/term elements. It instead maintains its own mapping of (recognized) Wikidata IDs to a suitable label. If you still want to establish some rules of what to put into keywords dracor-schema and the ODD would be a better place to do that.

  1. As far as I can see, there is no play yet marked-up with two genres, like "Libretto" AND "Comedy" in our collection. If we, say, additionally mark-up a "Comedy" as "Fragment", would we just add the new keyword within the same <textClass> element? I think it would be cleaner to add another <textClass> element, no? Otherwise it could be unclear which keyword the <classCode> value belongs to.

Currently, when there are multiple classCode elements (either in the same or separate textClass elements) the API picks the first one to display it as normalizedGenre and ignores all others. I would read https://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD43 so that textClass is meant as a single wrapper for all kinds of text classification specifications, although the schema would allow multiple elements. However there is no intrinsic relation between classCode and keywords and I don't think we need to artificially establish one. If we need to use multiple classCodes I'd suggest to put them all into the same textClass. Maybe we should add a textClassCodes array to the play JSON then to make them available via the API.

cmil commented 7 months ago

I somehow missed @peertrilcke's comment and went ahead adding the Wikidata ID for literary fragments to our recognised class codes. I think it doesn't do much harm there but if, for scientific reasons, we wouldn't recommend using this class code/Wikidata ID, we should probably remove it again. What do you think @lehkost and @peertrilcke?

lehkost commented 7 months ago

Thanks, Carsten. It's okay for now, I think. We pointed to all the problems with the status 'fragment' above. I would use the term in the sense "unfinished" as discussed above, and I'm open to place this kind of metadata in a better suited passage of the TEI document.

An alternative to encoding this within a single TEI file would be to collect fragments of plays in a list/collection with DraCor IDs, just like we did with the bourgeois tragedy. That way, there can be different lists of different people for different purposes.

cmil commented 5 months ago

I reverted the addition of the text class and will close the issue.