The goal of this project is to create a database of the early Irish Genealogies. To achieve this goal and due to the nature of the source material, the curators chose the Resource Description Framework (RDF) to represent it. Because this is a human curated database, a human readable representation of RDF was needed, which, in this case, the curators chose the TRiG concrete representation of RDF. It is recommended for those who may not have experience with RDF serializations to read the Turtle specification first before reading the TRiG one.
The database is structured by dividing the genealogies by manuscript. Each manuscript is given its own directory which is derived from its common scholarly abbreviation. For instance, all genealogies that are derived from the Book of Leinster are placed in the LL directory. As for the ontologies, these are placed in the top level directory.
Each genealogy is divided into its "items" which represent one Turtle
file in its directory. The item file name is created from its
manuscript header. For instance "Aisneidem Di Araill" from the Book
of Leinster has the file name aisneidem_di_araill.trig
.
The curators have not always been consistant in the naming of the items. Especally in LL, "Genelogia" or "De Genelach" have been omitted.
Within each item, the individual entries are given a URL to represent
that particular entry in the genealogy. The URL for an individual
entry, which constitutes a node in the RDF graph, is generated from
the instance of their name directly from the manuscript. If the same
name appears in exactly the same form appears, whether or not it is
the same person, then the first eight characters of a UUID generated
by uuidgen -r
are appended to differentiate between the different
instances is added. For example,
<CindFhaelad>
a foaf:Person;
irishRel:genName "Cind Fhaelad";
irishRel:nomName "Cenn Faelad";
rel:childOf <Airnelaig>.
<CindFhaelad-6e827350>
a foaf:Person;
irishRel:genName "Cind Fhaelad";
irishRel:nomName "Cenn Faelad";
rel:childOf <Gairb>.
At the present moment, all URLs are prefixed with http://example.com
because a permanent URL has not been purchased at this time. For
example, a full URL for <CindFhaelad-6e827350>
would be
http://example.com/LL/ceniuil_lugdach/CindFhaelad-6e827350
. This
URL can be read thus: <slug>/<manuscript>/<item>/<individual>
.
There are a few permutations of this in the database but the structure
should be similar enough for most users who are familiar with the
source texts to understand the structure.
Each item belongs to a manuscript and while this is represented in the
URL as described above, it is inconvenient to address the manuscript
itself. To allow for this and to allow queries which are easily
narrowed by manuscript, an extention to the triple format, called a
TRiG, is used. This extention allows
for the use of Named Graphs (see more
here). In
the case of this project, the manuscript is identified by its URL and
is the named graph for the triples. For instance, from
aisneidem_di_araill.trig
:
<http://example.com/LL> {
<>
a dctype:Dataset;
dcterms:title "Aisneidem Di Araill"@sga;
dcterms:isFormatOf <http://www.ucc.ie/celt/published/G800011F/text028.html>;
dcterms:format "application/trig" ;
prov:asDerivedFrom <http://www.ucc.ie/celt/published/G800011F/text028.html> .
<Conchobuir>
a foaf:Person;
irishRel:genName "Conchobuir";
irishRel:nomName "Conchobar";
rel:childOf <Fhactnai>.
<Fhactnai>
a foaf:Person;
irishRel:nomName "Fhactnai".
}
This snippet identifies these triples as being a part of the
<http://example.com/LL>
graph. In this way, queries can be done on
particular graphs and the user can programmatically determine which
triples belong to which manuscript.
While each entry in the genealogy has its own URL, many references are
to the same individuals. To represent this, owl:sameAs
is used to
link these URLs together. This is done: within a single item file,
across item files in the same manuscript, and across manuscripts.
This ensures that the various versions of the genealogies are
referenced together.
Occasionally, individuals will have alternate genealogies. For ease of curation, these alternate genealogies are attached directly where they appear in the manuscript. This will often make an individual look like they have three or more parents.
There are many instances where there are individuals who are mentioned
but have no name. RDF blank
nodes are
used to identify the individual. The curators chose a format which
uses a _:missing
plus a UUID fragment like above. For instance,
_:missing-04015614
a foaf:Person ;
foaf:gender "female" ;
agrelon:hasChild <Conmáel>, <h-Ér>, <Orbba>, <Ferón>, </Fergna>;
rel:parentOf <Conmáel>, <h-Ér>, <Orbba>, <Ferón>, </Fergna>;
agrelon:hasParent <Militis>;
rel:childOf <Militis>;
agrelon:hasSibling <Díl>;
rel:siblingOf <Díl>.
The alternate form of the blank node is used where convenient.
Often important individuals are credited with founding a clan or tribe. In this case the population group is created as its own URL which is constructed using the same principles as for a person, as above. For instance:
<Coscrach>
a foaf:Person;
irishRel:nomName "Coscrach";
agrelon:hasParent <Lorcan>;
rel:childOf <Lorcan>;
irishRel:numChild 12 ;
irishRel:ancestorOfGroup <ClandCosraig>.
<ClandCosraig>
a irishRel:PopulationGroup ;
irishRel:PopulationGroup "Cland Cosraig" .
Occationally, in the manuscript sources, there is more information
about an individual which is added to the entry by using
rdfs:comment
. This is done because the curators wished to capture
relevant non-structured information to capture the context of an
entry. For instance,
<Lachtna-32e54830>
a foaf:Person;
irishRel:nomName "Lachtna";
agrelon:hasParent <Cennétig>;
rel:childOf <Cennétig>;
irishRel:numChild 0;
rdfs:comment "is é ro gab ríge dar éis Cennetig. Unde dicitur Grianan Lactnai i Creicc Léith...".
There are several utility Perl scripts which ease the creation and
curation of the database. Look in the utils
directory for more
information.
More specific information about the project can be found on the blog: IrishGen Occasional Topics.