Open AlisonBabeu opened 6 years ago
Interesting.
let $hits := collection('/db/PerseusCatalogData/mads')//mads:identifier[@type='citeurn' and . ='urn:cite:perseus:author.570.1']
return count($hits)
This yields only 1 hit, but
15:47 $ ag 'urn:cite:perseus:author.570.1' .
PrimaryAuthors/E/Erinna/author.570.1.mads.xml
37: <mads:identifier type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
PrimaryAuthors/L/Linus_Historicus/author.570.1.mads.xml
15: <mads:identifier type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
✔ ~/repos/github/PerseusDL/catalog_data/mads [pending_review L|✔]
I see what happened here: my import into eXist flattened the directories in PrimaryAuthors, so the author.570.1.mads.xml record was over-written. I'll adjust this and give you a full report shortly.
xquery version "3.1";
declare namespace mads="http://www.loc.gov/mads/v2";
let $hits := collection('/db/PerseusCatalogData/mads')//mads:identifier[@type='citeurn']
return
<count total="{count($hits)}" distinct="{count(distinct-values($hits))}"/>
Yields <count total="2343" distinct="2342"/
and
let $hits := collection('/db/PerseusCatalogData/mads')//mads:identifier[@type='citeurn']
for $hit in $hits
where count($hits[. = $hit]) > 1
return $hit
yields
<mads:identifier xmlns:mads="http://www.loc.gov/mads/v2" type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
<mads:identifier xmlns:mads="http://www.loc.gov/mads/v2" type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
So it looks like that's the only duplicate.
But attached is the list.
Hi @cwulfman as I work through the authority record/text group project, I realized that several author CITE URNs were actually reassigned for some reason during one of the last updates.
To begin, 1) There are two
<mads:identifier type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
The authority record for Erinna and for Linus O.. This CITE URN ID was first assigned to Erinna, and while it is still in the MADS record, it is no longer in the CITE Collection authors table, which means you can't find her authority record in the Perseus Catalog.2) There are two entries for Linus O. in the CITE Collections table, with two CITE URNs, one is
<mads:identifier type="citeurn">urn:cite:perseus:author.570.1</mads:identifier>
as above, and oneurn:cite:perseus:author.1462.1.
Problem is this second CITE:URN if you search in catalog_data, actually belongs to Verrius Flaccus, and due to it being reassigned, you also now can't find Verrius Flaccus in the CITE Collection authors table or his authority record in the Perseus Catalog. So in both cases these authors have textgroups but no authority records.Would it be possible to export a list of CITE URNs in XML or CSV from the MADS records in catalog_data so I can see if there are any other duplicates? Thanks!