dracor-org / dracor-schema

ODD and schemas for dracor.org files
https://dracor.org/doc/odd
5 stars 2 forks source link

explicitly encode the information that a play is a translation of some other play (in DraCor) #56

Open ingoboerner opened 3 months ago

ingoboerner commented 3 months ago

Maybe re-use <listRelation type="translation"> and include a <relation> with links to the plays.

lehkost commented 3 months ago

I opened another issue on this: https://github.com/dracor-org/dracor-schema/issues/59. This should be easier to do and already adds visibility to a play being a translation.

There's one problem I see with the suggestion to use listRelation. In DraCor, we have 'editions' that we also see as representations of a 'work', which is why we link them to the 'work' entries in Wikidata (also the assumption behind the DraCor Property in Wikidata). In this sense, we could say, for example: ger000662 is a translation of fre001256.

But this could create a false idea of the actual text that a translation is based on, usually a specific edition of a text. Other translations of the same 'work' might be based on other versions and differ quite a lot.

Plus, in some corpora we have several versions of a play (e.g., fre001199 and fre001200). Which one to link to? So, if we would store the information of the original, we should probably use Wikidata IDs, not DraCor IDs.

lehkost commented 3 months ago

@ingoboerner: Taking into account that translation and adaptation are different things and that it's sometimes hard to decide, what is what, we could indeed introduce a type like "based on" or "adapted from" as used in Wikidata. We could also stick to "translation" and document that, in DraCor, we include adaptations in this type.

I picked up a consensus that we should not link originals and translations between DraCor IDs, but to link translations/adaptations to the literary work on Wikidata, right? (This might include starting a Wikidata item for an original play if there's none yet.)

With that, we could actually built this option into the schema and start using it.

Regarding frontend issues, see #59, I think these are two tickets we could resolve relatively quickly.

cmil commented 3 months ago

Do we actually need to change anything in the schema for that? I would suggest to use the listRelation we already have in standOff and add relation elements to it with a proper name attribute. Which attribute values we support should be documented. Currently we only support relations with the name "wikidata". We could implement support for "translation" or "based-on".

ingoboerner commented 3 months ago

I am also in favor of re-using the <relation> in the standoff container. Maybe we should look into some other ontologies what good types of relations could be.

I picked up a consensus that we should not link originals and translations between DraCor IDs, but to link translations/adaptations to the literary work on Wikidata, right? (This might include starting a Wikidata item for an original play if there's none yet.)

No, I am in favor of using DraCor IDs for links, not for going the detour of Wikidata. The relation does not say on with layer of the Work-Manifestation-Expression-Item hierarchy the items are linked, so we are totally free to define that ourselves without relying on Wikidata altogether.

lehkost commented 3 months ago

I picked up a consensus that we should not link originals and translations between DraCor IDs, but to link translations/adaptations to the literary work on Wikidata, right? (This might include starting a Wikidata item for an original play if there's none yet.)

No, I am in favor of using DraCor IDs for links, not for going the detour of Wikidata. The relation does not say on with layer of the Work-Manifestation-Expression-Item hierarchy the items are linked, so we are totally free to define that ourselves without relying on Wikidata altogether.

I see the beauty of it for the internal DraCor network directly growing between different corpora. However, as I point out above, this is not always possible with 1:1 relations. If, for example, we'd have a German translation of Racine's Thébaïde (let's give it the fictitious ID ger000999), then there ar at least two possible relations:

This is not a unique case and a general problem in some of our corpora, which would be solved by linking to the Wikidata item Q3213141, which represents the work as such, not editions:

<relation name="based-on" active="https://dracor.org/entity/ger000999" passive="http://www.wikidata.org/entity/Q3213141"/>

lehkost commented 3 months ago

Maybe we should look into some other ontologies what good types of relations could be.

Good idea, Wikidata for example has two properties in question, which are sometimes used interchangeably (which is itself a problem, of course):

ingoboerner commented 3 months ago

Thanks for the examples, but you probably meant to replace the wikidata base-uri with DraCor's, right?

<relation name="based-on" active="https://dracor.org/entity/ger000999" passive="https://dracor.org/entity/fre001199"/>

I think, I have to explain a bit more in detail why I can not understand why we could not use DraCor IDs or URIs derived from them to express the "translation" relation in the realm of DraCor without the detour of Wikidata. I just look at the encoding and the current technical implementation, OK?

However, as I point out above, this is not always possible with 1:1 relations.

I think, this is a question of formally defining the property (i.e. the relation type used in @type). What does "based on" ("based-on") in the examples mean? This needs to be formally defined, otherwise it does not make any difference if you really use the one identifier or the other.

It is a little bit different with the the identifiers (URIs) that we use as values of the attributes of @active and @passive, which are, at least from a web-semantic standpoint, defined, i.e. it is at least foreseen that a client could dereference the URI and find more information about it. This would work by sending the Accept Header and requesting RDF (see https://dracor.org/doc/api#/public/resolve-id, for JSON this works, for RDF not so much, I tested application/rdf+xml as the value of the Accept header, application/json works). So, currently in a semantic web context https://dracor.org/entity/ger000999 does not mean anything apart from that it is a thing that is identified by exactly this URI.

For reference, in the code here: https://github.com/dracor-org/dracor-api/blob/4446437a4fbced7718f93d97a3ec2a76021d2332/modules/api.xqm#L169-L196 sends the status code 303 See other (https://en.wikipedia.org/wiki/HTTP_303) which is the desired behaviour as of here: https://www.w3.org/TR/cooluris/, and theoretically, application/rdf+xml (https://www.iana.org/assignments/media-types/application/rdf+xml) is implemented here: https://github.com/dracor-org/dracor-api/blob/4446437a4fbced7718f93d97a3ec2a76021d2332/modules/util.xqm#L1271C25-L1272

You say:

In DraCor, we have 'editions' that we also see as representations of a 'work', which is why we link them to the 'work' entries in Wikidata (also the assumption behind the DraCor Property in Wikidata)

In the DraCor data context this is not obvious, because https://dracor.org/entity/ger000999 as any or any other URI does not mean anything here. From this standpoint you can not argue, that they represent editions of works, because, this obviously is not the case, as they are merely identifying an individual, but unspecified thing.

We could also think of somehow using the DraCor IDs, e.g.ger000999 as means of expressing the semantics that something has a to be defined relation to something else:

We are now in the TEI realm, really. These IDs are (currently) introduced in the TEI files as values of the attribute @xml:id on the root element, as opposed to the earlier use of these identifiers in the element <idno>. I looked at this change in the context of our CLS INFRA D7.3 report:

The commit d23a93d9fa0e4eb53a580904ac5d01c8b8f8037c dating from 3 June, 2022 adds the DraCor ID as the value of the attribute @xml:id to the root element and changes the encoding of the reference to Wikidata from a to a with the value "wikidata" of the attribute @type to 569 of all 569 play TEI files available at that time.

If you compare the two uses of the DraCor IDs before and after the significant change mentioned above:

1) <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="ger"> and

<publicationStmt>
        <publisher xml:id="dracor">DraCor</publisher>
        <idno type="URL">https://dracor.org</idno>
        <idno type="dracor" xml:base="https://dracor.org/id/">ger000569</idno>
        <availability>
          <licence>
            <ab>CC0 1.0</ab>
            <ref target="https://creativecommons.org/publicdomain/zero/1.0/">Licence</ref>
          </licence>
        </availability>
      </publicationStmt>

VS

2) <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="ger000569" xml:lang="ger">

I would understand this as such:

In 1) we have an identifier that identifies a publication product, because, according to the TEI Guidelines <publicationStmt> "groups information concerning the publication or distribution of an electronic or other text." So, in this case ger000569 identifies the electronic, i.e. DraCor-way "encoded" version of a text. The <sourceDesc> contains the bibliographic data of the source; so in the case of this encoding it was somewhat explicit that ger000569 identified the DraCor encoded version of the edition of the work "Am ersten Mai" created by Karl Adolph and published by Neuer Akademischer Verlag in 1919, see below:

<bibl type="originalSource">
            <title>Karl Adolph: Am ersten Mai. Eine Tragikomödie der Arbeit aus Friedenstagen.
              Leipzig; Wien: Neuer Akademischer Verlag 1919.</title>
          </bibl> 

So, OK, I give you that, @lehkost, the DraCor ID (not the URI we currently use) identified an electronic version of a printed edition which is a manifestation of an expression of a work (although there are not identifiers in place for the work). An I think this is still you current understanding when you address the problem of linking by using DraCor IDs or URIs derived from them.

But, things changed in 2), because the semantics of ger000569 identifying an edition of a work were removed by removing the <idno> element from the <publicationStmt> and including the id as the @xml:id on the root element. We removed the ID so to say from the TEI without re-introducing an equivalent. This, IMHO, "de-semantisized" the ID being only a technical identifier to retrieve elements <TEI> from the eXist database. Looking solely at the TEI we have an identifier, that identifies an XML-element <tei:TEI> and nothing more.

And, other things that would say X is the identifier of an edition, we don't have really, because, to my knowledge there is no other documentation really explaining what a DraCor ID would mean otherwise.

I don't really know what the conclusion to this can be apart from, either we document, or anything goes because the identifiers don't identify the things we think they do anyways..