Open tla opened 1 year ago
The fact of the deaths themselves are already in the database; here we are parsing and adding the date information. We can discuss the details further on Wednesday, and make notes in this issue.
also related to issue #2 revised version of the death factoids, completed. you can access the updated spreadsheet through this link
The spreadsheet with death records has been updated with sources on which I based the datings where my name is the authority. Therefore, the file from 21.11.2023 has been updated to the file named "C11 PBW Death records, AA_revised version_09.01.2024." xlsx, accessible here https://ucloud.univie.ac.at/index.php/f/797833040
Report from @lu-pl 💯 I implemented the table conversion for the editor rows, see example output. The P14 assertion for assigning Aleks or Marton is still missing, will add it today (+ some minor fixes).
Note that some SPARQL queries return empty, in which case no RDF is generated. See the logs. I haven't really looked into that (yet) because I think you said you would like to investigate the empty queries yourself.
Note that some SPARQL queries return empty, in which case no RDF is generated. See the logs. I haven't really looked into that (yet) because I think you said you would like to investigate the empty queries yourself.
Some of these are expected (where they are based on sources that we ended up not using), but others have to do with the fact that the Name
column has something added in parentheses. So for example Ioannes (Smbat) 106
should just be queried as Ioannes 106
. I don't know where the parenthetical text came from, but it needs to be stripped / ignored in all cases.
For sanity-checking purposes, it might be helpful to keep a list of the sources we aren't using; these include Council of 1157
, Italikos
, Niketas Choniates, Historia
, Pantokrator Typikon
, Prodromos, Historische Gedichte
, Tzetzes, Letters
at least. If you could implement these as exclusions (i.e. if the Source canonical name
is one of these, just skip the row) and output in the log what the source was every time a query returns nothing, this would help me audit a new run.
Update:
Parenthetical text in Name fields gets ignored now and unused Source values are skipped (see the log).
The script now generates a trig file deaths.trig with a named graph for every table partition.
I also investigated the empty queries, some of those were caused by typos or incomplete PBW strings in the tables. I queried the store for the correct PBW strings and manually updated the tables in the r11tab/tables/xlsx folder.
For the remaining empty queries in most cases the PBW data is missing in the triplestore, so I don't really know what to do about that.
Note: I would like to/will port the metadata schema used in the r11cli application to the table conversion at some point, if that is alright.
I've now looked at the empty queries, which have three causes:
Pantokrator Typikon
but the string was modified.Christos Philanthropos, note
every time) and so has a slightly different modeling structure (we didn't create a Text Expression for this publication, but instead we created a Manifestation Creation event whose authority is the publication author, i.e. the editor of the text). The following query should work.PREFIX crm: <http://www.cidoc-crm.org/cidoc-crm/>
PREFIX star: <https://r11.eu/ns/star/>
select ?pub ?d ?a4 ?e
where {
?a1 a star:E13_crm_P3 ;
crm:P140_assigned_attribute_to ?d ;
crm:P141_assigned """She died on a November 1 [shortly after 1100, a year before <Isaakios 61>]"""@en ;
crm:P14_carried_out_by ?authority ;
crm:P17_was_motivated_by ?source .
?d a crm:E69_Death .
?a2 a star:E13_crm_P100 ;
crm:P140_assigned_attribute_to ?d ;
crm:P141_assigned ?p .
?p a crm:E21_Person .
?id a crm:E15_Identifier_Assignment ;
crm:P140_assigned_attribute_to ?p ;
crm:P37_assigned ?e42 .
?e42 a crm:E42_Identifier ;
crm:P190_has_symbolic_content "Anna 61" .
?a3 a star:E13_lrmoo_R15 ;
crm:P140_assigned_attribute_to ?pub ;
crm:P141_assigned ?source .
?a4 a star:E13_lrmoo_R24 ;
crm:P140_assigned_attribute_to ?pubcreation ;
crm:P141_assigned ?pub ;
crm:P14_carried_out_by ?e .
?e crm:P3_has_note ?editor .
} limit 1
I forgot the fourth case, which was a death record for Symbatios 101 from Iveron 2.178.5; this is from a document in the Iveron archive that was produced in 1098, which is past our cutoff point of 1095.
All empty query cases are handled now (see logs and I updated the script to the new metadata schema.
The way this is impemented now, a named named + metadata is generated for every table partition, see deaths.trig. Another option would be to merge all graphs in to a single named graph and generate metadata only for that graph.
note: Metadata of course gets generated only once for every software execution, but every named graph is registered as being an output of that software execution, see the metadata graph.
The script now produces a single turtle file with all subgraphs merged, see deaths.ttl.
I had to slightly modify the metadata schema, metadata assertions are now pointing to E13 subject nodes instead of named graphs along L11_had_output. Since the range of L11 is D1_Digital_Object this implies (and a reasoner would inference) that E13 assertions are D1s i.e. E73_Information_Objects - which is not wrong but maybe something worth pointing out.
Meeting notes: Lukas has changed the metadata schema, which Tara will put on the Graph database. A new issue might be necessary for converting all old metadata into new metadata schema.
Ingested deaths data to https://r11.eu/rdf/resource/deaths.
Note: Consolidation/merging of named graphs into another named graph can be automated using SPARQL update (INSERT) requests.
This should be implemented in r11cli.
edit: DROPing a named graph would not be reflected in the merged graph though, so one would need to SPARQL the merged triples out of target graph before deleting the named graph!
delete { ?s ?p ?o . }
where {
graph <named_graph> {
?s ?p ?o .
}
}
drop graph <named_graph>
Hi @lu-pl , concerning the metadata schema, I've just noticed a problem with the timestamps...
star:cd81994d8e a crmdig:D10_Software_Execution ;
crm:P82_begin_of_the_begin "2024-03-25T08:07:23.267077"^^xsd:dateTime ;
The first issue is that begin_of_the_begin
is actually P82a, not P82 itself; the second issue is that a crmdig:D10_Software_Execution
is a subclass of E7, not E52, which is what the domain of P82* is supposed to be. So this would need to be rewritten to something like
star:cd81994d8e a crmdig:D10_Software_Execution ;
crm:P4_has_time-span [ crm:P82a_begin_of_the_begin "2024-03-25T08:07:23.267077"^^xsd:dateTime ] ;
hi @tla, the metadata issue should be fixed, see deaths.ttl.
LODKit now has a feature for Ontology derived ClosedNamespaces, so at least typos won't be an issue anymore.
Factoid data attached
c11deaths-AA.xlsx c11deaths-MR.xlsx