dracor-org / romdracor

Roman drama corpus, adapted from Perseus Digital Library.
3 stars 1 forks source link

Investigate data on characters on wikidata #6

Open ingoboerner opened 4 years ago

ingoboerner commented 4 years ago

@nevmenandr found out, that wikidata is inconsistend, when it comes to mythological characters. Please provide examples!

ingoboerner commented 4 years ago

Useful properties: https://www.wikidata.org/wiki/Property:P1441

ingoboerner commented 4 years ago
PREFIX frbroo: <http://iflastandards.info/ns/fr/frbr/frbroo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?wd ?label FROM <https://dracor.org/rom> WHERE 
{?character a frbroo:F38_Character ;
            rdfs:label ?label ;
            owl:sameAs ?wd 
  FILTER(LANG(?label) = 'latin')
} 
ORDER BY ?label

returns 51 unique Links to Wikidata-Entites

queryResults.csv.zip

ingoboerner commented 4 years ago

OK, the query should look for DISTINCT ?wd only, because the labels (of course) vary..

nevmenandr commented 4 years ago

@nevmenandr found out, that wikidata is inconsistend, when it comes to mythological characters. Please provide examples!

In fact, this is not about the inconsistency of the Wikidata, but about the inconvenience of their URI for the purposes of our cross-corpora markup. There are three types of such cases.

  1. Umbra case. In ancient plays, characters and their shadows are often found separately. Sometimes they are found within the same play. Then they have different dracor ids, and the URI on Wikidata is the same. See Tantalus and Tantali Umbra in Thyestes by Seneca and Δαρεῖος and Εἴδωλον Δαρείου in Persians by Aeschylus.
  2. Ivppiter case. The task of cross-corpora markup is to tie together characters from different plays and different corpora. To have Achilles by Euripides and by Kleist at the same data object. From this point of view, it is obvious that Zeus and Jupiter are the same character, they perform the same function in the story about Hercules. But in Wikidata we find different URIs for Zeus and Jupiter. Thus, our data is separated. Perhaps, this is solved by the additional parameter of SPARQL query: P460 parameter. But every time this parameter must be different (depends on gods, kings and so on).
  3. Talthybius case. Talthybius has two relevant objects in Wikidata. First of all, it's a character from a myth. This URI allows us to connect the character of the tragedy Troades by Seneca and The Trojan Women by Euripides. But! The Euripides tragedy character has his own URI in Wikidata! Technically, its attribute should have the value http://www.wikidata.org/entity/Q60607554, but it contradicts the idea of tying the characters of different plays together. Additional SPARQL parameter could be P1074 (fictional analog of). We can also add data to present in work (P1441) from dracor.
ingoboerner commented 4 years ago

Count overlaps of characters

PREFIX frbroo: <http://iflastandards.info/ns/fr/frbr/frbroo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
SELECT ?wd (COUNT(?drama) AS ?CNT) WHERE 
{
  GRAPH ?g { 
    ?character a frbroo:F38_Character ;
        owl:sameAs ?wd .

    ?drama schema:character ?character .
  }
} 
GROUP BY ?wd
ORDER BY DESC(?CNT) 

alternative:

PREFIX frbroo: <http://iflastandards.info/ns/fr/frbr/frbroo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX urn: <http://fliqz.com/>
SELECT ?wd (COUNT(?drama) AS ?cnt)  FROM <urn:x-arq:UnionGraph> WHERE 
{
  ?character a frbroo:F38_Character ;
             owl:sameAs ?wd .
  ?drama schema:character ?character .

} 
GROUP BY ?wd
ORDER BY DESC(?cnt)