Fraunhofer-FIT-DSAI / drk-information-model

Datenraum Kultur (DRK) Information Model
3 stars 1 forks source link

DRK IM UC3: Find controlled vocabularies for representing ENUMs #6

Open rohitadeshmukh13 opened 4 months ago

rohitadeshmukh13 commented 4 months ago

Description

Example of how controlled vocabularies can be used to represent ENUMs

@prefix ex: <http://example.org/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix loclang: <http://id.loc.gov/vocabulary/iso639-2/> .
@prefix gndgeo: <https://d-nb.info/standards/vocab/gnd/geographic-area-code#> .

ex:Movie1 a ex:Movie ;
    dcterms:title "Movie1: subtitle1"@en ;
    ###
    # **Languages**:
    # Library of Congress's controlled vocabulary for languages based on ISO 639-2: https://id.loc.gov/vocabulary/iso639-2.html 
    # RDF N-Triples representation: https://id.loc.gov/vocabulary/iso639-2.nt
    ###
    ex:language loclang:eng ;
    ###
    # **Countries (Geographic Areas)**:
    # GND Geographic Area Codes: https://d-nb.info/standards/vocab/gnd/geographic-area-code.html
    # RDF TTL representation: https://d-nb.info/standards/vocab/gnd/geographic-area-code.ttl
    ###
    ex:releasedIn gndgeo:XA-DE .

RESULTS: Identified controlled vocabularies for UC3

rohitadeshmukh13 commented 4 months ago

From our discussion in the DRK IM UC3 meeting yesterday: For representing Genre ENUMs, we can investigate the use of the following by the Library of Congress:

peret commented 4 months ago

Regarding the production type (and event type): Wikidata already has entries for pretty much all of the types we want to consider, for now. I'm not sure if Wikidata entries are enough for your purposes, but here is a list:

Some of these also have a GND ID, but not all of them, as far as I can tell.

The only difference to our current UC3 data model is that there is no single entry for a "first performance" (Erstaufführung), but three more specific ones:

peret commented 4 months ago

Regarding a vocabulary for (theatric) genres: I had a look at the Library of Congress links you provided, and tried to find some of the terms/genre we picked for our list of enum values. Some of them I was able to find in the LoC vocabulary, but especially with some of the more specific terms, I was not. I also compared this to what is available in the GND and that seems to cover more of the terms we are interested in.

To show what I mean, I put together a list of genre terms that might be interesting for us and that have an entry in the GND but apparently NOT in the Library of Congress database. At least as far as I could find:

In some cases, this might also be related to translation issues, i.e. I might not know the equivalent english term for a genre that exists in German. Also note that I'm not saying that the GND includes all possible genre terms we might be interested in (it doesn't), but so far it seems a better source compared to the Library of Congress information.

peret commented 4 months ago

@rohitadeshmukh13 To clarify, here are the values for eventType vs productionType:

eventType:

productionType:

Daham-Mustaf commented 3 months ago

Language Enumeration

For the Language Enumeration, we can use the following ontology:


# This ontology defines a Language enumeration for UC3.
# It uses ISO 639-1 language codes and currently supports English and German.
# The ontology is extensible and includes instructions for adding new languages.

:Language a owl:Class ;
    owl:oneOf ( :en :de ) ;
    rdfs:comment "Enumeration of supported languages in the system using ISO 639-1 language codes." ;
    rdfs:seeAlso <https://www.loc.gov/standards/iso639-2/php/code_list.php> ;
    dc:source "ISO 639-1" ;
    skos:prefLabel "Language"@en, "Sprache"@de ;
    skos:note """
    This enumeration currently includes only English (en) and German (de).
    When extending the system to support additional languages:
    1. Add new language instances using their ISO 639-1 two-letter codes.
    2. Update the owl:oneOf list to include the new language instances.
    3. Ensure that all application logic and user interfaces support the newly added languages.
    4. Consider implementing a more flexible language handling system if the number of supported languages grows significantly.

    Example of adding French:
    :fr a :Language ;
        skos:prefLabel "French"@en, "Français"@fr ;
        skos:notation "fr" .

    Then update the oneOf list:
    owl:oneOf ( :en :de :fr )
    """ .

:en a :Language ;
    skos:prefLabel "English"@en, "Englisch"@de ;
    skos:notation "en" .

:de a :Language ;
    skos:prefLabel "German"@en, "Deutsch"@de ;
    skos:notation "de" .
rohitadeshmukh13 commented 3 months ago

Hi @peret, I've created RDF Turtle representations of controlled vocabularies for Theatrical Genres, Theatrical Production Types, and Theatrical Event Types. Since our starting point was only the term names, I've added translations and descriptions based on my understanding, referring to various online sources. Therefore, I would greatly appreciate it if you could review the values of the following properties:

Please feel free to make any necessary changes directly in the files and push them to the uc3 branch, or provide the updates via comments—whichever you prefer. Thank you.

peret commented 3 months ago

Thanks @rohitadeshmukh13. I had a look and created a PR with my changes, so y'all have a chance to double-check.

Two additional things I want to mention/ask:

  1. In regards to the language premiere, local premiere, and country premiere - we think that we often will not receive the specific information whether a certain production is e.g. a language premiere or country premiere, just that it's a premiere of some sort ("Erstaufführung"), so we might need a more general concept for this in our production types. Unfortunately, these terms don't seem to be easily translatable. E.g. in German language use there is a difference between "Erstaufführung" and "Premiere", whereas in English both of these would be called "premiere", I think? I'm happy to discuss this further, if necessary.
  2. For the theatrical genres, some of them are actually in relationships with each other. E.g. all three of Opera, Operetta, and Musical are forms of Music Theater. Would this vocabulary definition be the place to define these relationships? Are you already planning to add that? Would that be necessary or useful in the context of the Information Model?