Open cwulfman opened 6 years ago
Hi @cwulfman I'm pretty sure we could alter the URN in various ways, including as you suggest. These multi-volume MODS records grew out of a consolidation effort. Even though all of the MODS records within a modsCollection record contain the same URN, they are still all technically the same edition, since they all have only one URN. In CTS essentially, one URN equals one edition. I had originally wanted to keep all the MODS records for each edition because they each contain specific information about the volume such as links to GoogleBooks and other online editions, as well as part information about the work (e.g. Books III-V of the Odyssey)
The bigger problem has been that I had to redirect a number of URNs during the mass consolidation and for this reason, in the long term, we should problem consider deprecating all of the current URNs and renumbering the editions under various authors from scratch.
For example, if you look at the record for Thucydides Historiae, there are actually only about 30 to 40 editions cataloged but they all have very random URNs, due to the redirecting and consolidation.
BTW, there are also a number of multi-volume editions in catalog_pending too.
Here are two re-workings for you to take a look at , @AlisonBabeu . I think MODS does a poor job distinguishing between logical and physical structure and therefore dealing with multi-volume works, but I'm not a cataloguer so I may be missing something. The records attached express the entire edition as a single mods item with constituents for the physical volumes. I think I'd prefer to express logical structure in the MODS (e.g., the 8 books of Thucydides' work), but that's coming "top-down" from the work to the physical items, and I do understand that cataloguers need to deal with things "bottom up" (from the object in their hands). This is where METS becomes useful.
Thoughts?
tlg0003.tlg001.opp-grc1a.mods1.xml.zip tlg0003.tlg001.opp-grc1.mods1.xml.zip
Hi @cwulfman. I really like both of those examples, they provide a very elegant solution and contain all of the relevant details for each individual volume, namely online links, TOC with work part data, etc. The original solution of MODS consolidation was largely developed as a way to quickly aggregate individual records and have only one URN per edition.
One question, I noticed that you attached Thucydides name to his VIAF identifier.
<name>
<nameIdentifier type="viaf">46144928073854340420</nameIdentifier>
<namePart>Thucydides</namePart>
<role>
<roleTerm authority="marcrelator" type="code">cre</roleTerm>
</role>
</name>
<name>
whereas in previous Greek Anthology files author names had been identified in the following way using the TLG or other ID, and in many cases these authors also do have VIAFs.
<name type="personal">
<nameIdentifier type="tlg">2123</nameIdentifier>
<displayForm lang="la">palladas</displayForm>
<role>
<roleTerm>cre</roleTerm>
</role>
</name>
Any major reasons for the change or is our way of identifying authors still in flux I assume.
I like the displayForm solution much better!
@AlisonBabeu , thinking more about this: what do you think about making the nameIdentifier type citeurn? That ties the MODS and MADS records together better.
In fact, it's vital: by doing that, you can look up an author's works by searching all the MODS for the citeurn in the MADs.
I would be ok with trying that out but I'm having some trouble conceptualizing it entirely. So the CITEURN from an individual authority record/textgroup would then also be found in all of the MODS records for that author/textgroup as well? No matter what else we do, I do also want to keep the same textgroup identifiers for the CTS-URNs, however, since if we stopped using all of those identifiers, our workflow would suddenly be seriously out of synch with the OGL Project.
Something has to tie the works, editions, authors, and text groups together, right? If one knows the citeurn of an author, one ought to be able, for example, to execute a relational query (SQL, XQuery) to find all the works for which the (or an) author is that author. A simplified example:
collection('/db/PerseusCatalogData')//mods:mods[//mods:nameIdentifier[@type='citeurn'] = 'urn:cite:perseus:author.1403.1']
This should retrieve all the texts by Thucydides.
Well this would certainly alleviate the issue of creating records for authors with no canonical identifier for them, even though I might still need to do that to accommodate other workflows already in place. Simpler aggregation is definitely needed though.
I've created a new issue #121 to carry on this discussion about the nameIdentifier element.
Meanwhile, I've converted all those modsCollections with duplicates of the cts urn into mods records with a single cts urn and constituents. I've pushed these changes to development; take a look and tell me what you think.
Hi @cwulfman, I'm really pleased with this result, it is a much simpler approach. So will whatever algorithmic approach you've taken also be able to do this with the modsCollections files in catalog_pending I assume.
And one other thought on these records, I've noticed in the first Thucydides example: urn:cts:greekLit:tlg0003.tlg001.opp-eng12, that the title for the entire record displays as follows:
<titleInfo xml:lang="en" type="uniform">
<title>Histories</title>
</titleInfo>
<titleInfo>
<nonSort>The</nonSort>
<title>history of the Grecian war, in eight books</title>
<partNumber>Vol I</partNumber>
</titleInfo>
even though this is the title for just the first volume, since this title is then repeated in the first <relatedItem type="constituent">
section, it can be very confusing for the user. I'm assuming this title is used because is it the CiteCollection label for the entire CTS-URN in the Cite_Collection tables. I've documented this display issue before and was wondering if there was some way to perhaps just display the uniform title at the top part of the MODS record. Am I making any sense?
Sorry about the confusing comment above @cwulfman. I forgot to close off my XML section and the rest of the comment ended up in the "XML". Do I need to report the comment or do you mind scrolling to the right? :)
(I fixed the comment: you don't need the string "xml" after the ``` quote marks).
You'll almost certainly want to review all those converted records the makes sure the titleinfo in the main record really is the uniform title (or whatever the proper title for the work as a whole really is). I grabbed this title from the first mods record in the collection.
And yes: we can apply the same method to catalog_pending!
I've merged these into master and pushed to GitHub. Once we've reviewed those uniform titles, we can close this ticket.
I will start digging through the uniform titles tomorrow @cwulfman
hi @cwulfman, as I start to dig through these files I realize that there is a little bit of information that got left behind in the original MODS records that I would like to capture when we use this method for catalog_pending, so it may need a bit of tweaking.
As I started to check the first uniform title in this record
I realized that the new MODS record no longer contained the series information anywhere, and there was unique series information in each volume. For example:
<mods:relatedItem xmlns="http://www.loc.gov/mods/v3" type="series">
<mods:titleInfo>
<mods:title>Loeb classical library</mods:title>
<mods:partNumber> Volume 319</mods:partNumber>
</mods:titleInfo>
</mods:relatedItem>
The top level aggregation also only contains the publication date for the first volume
<mods:dateIssued>1937</mods:dateIssued>,
which could be confusing to users, because in the case of the example I've used, the seven volumes were published between 1937 and 1950. I apologize that I didn't notice or think about this type of data when I first reviewed the replacements for the modsCollection files.
Since this current commit is only about a 130 records, for these I think I am reasonably content to add the missing information back in by hand, but moving forward with this method could we include this information as well in the aggregation.
The following CTS urns are assigned to multiple MODS records: looks like these are multi-volume works, each volume of which has the same urn. That's problematic for several reasons, the most immediate having to do with processing editions: e.g., the Teubner edition of Antiquitates Romanae in five volumes (urn:cts:greekLit:tlg0081.tlg001.opp-grc4) is one edition, not 5. If there must be a separate record for each volume, would it be possible to augment the urns with some sort of sequence index, e.g. urn:cts:greekLit:tlg0081.tlg001.opp-grc4-1?