Open samosafuz opened 1 year ago
As usual, I'm needlessly complicated: instead of generating filenames via base-uri()
, it's easier to simply use the value of <idno type="ddb-hybrid">
; this way, it's also no longer necessary to convert the output of string()
. Sorry.
I wrote an XQuery to retrieve the following values for $file in collection("/db/papyri/idp.data/DDB_EpiDoc_XML")//tei:ref[@type = "reprint-in" or @type = "reprint-from"]
:
$file/ancestor::tei:TEI//tei:idno[@type = "ddb-hybrid"]/string()
$file/../tei:ref/@type/string()
$file/@n/string()
The results are now in a Google sheet: https://docs.google.com/spreadsheets/d/1HXyRmGZ5qnBULswYcZIul1OAf84mjUuNMW2NiChuXUA/edit#gid=0
I haven't tried to line up the items that recursively point to one another yet, but this sheet is helpful for identifying places where reprint-in reprint-from
both appear in the @type
column, and where @n
is empty.
As samosafuz pointed out, there are various muddles to deal with before being able to do a proper search for cycles on the graph of reprints (e.g. p.oxy;44;3208|chla;47;1420|c.ep.lat;;10), so as not to try following a piece of reprint information with a wrong ddb-hybrid or end up in a blind alley because the @n attribute is missing.
jcowey spotted several clusters of erroneous or incomplete mark-up that lead to muddled reprint information, such as:
He suggested opening tickets to tackle these faults beforehand.
As was noted in the discussion to #180: the dummy header of
/ddbdp/rom.mil.rec;1;11
previously hadreprint-in
where it should have hadreprint-from
: as a result, bothrom.mil.rec;1;11
andstud.pal;14;8C
hadreprint-in
dummy headers pointing to one another, producing a recursive loop that no doubt confused the numbers server and prevented the retrieval of the file.We should determine whether similar recursive loops occur elsewhere, i.e., where two files have
reprint-in
orreprint-from
dummy headers in which the values of//ref[@type = "reprint-in" or @type = "reprint-from"]/@n
point to one another. Something along these lines may work://ref[@type = "reprint-in"]
base-uri()
(but removing the.xml
suffix) as well as well as the target://ref[@type = "reprint-in"]/@n/string()
string()
fromddb-hybrid
toddb-filename
format (i.e., replace semi-colons with periods)string()
matchThis issue is unique to DDB. The process can be repeated for
@type= "reprint-from"
A related (but separate) issue is instances where
//ref[@type = "reprint-in" or @type = "reprint-from"]/@n
has a value but//ref
is empty, or where//ref
has a value but//ref[@type = "reprint-in" or @type = "reprint-from"]/@n
is empty. These files should also be identified, so that we can populate everything appropriately.