multiple MODS records with same cts urn

cwulfman commented 6 years ago

The following CTS urns are assigned to multiple MODS records: looks like these are multi-volume works, each volume of which has the same urn. That's problematic for several reasons, the most immediate having to do with processing editions: e.g., the Teubner edition of Antiquitates Romanae in five volumes (urn:cts:greekLit:tlg0081.tlg001.opp-grc4) is one edition, not 5. If there must be a separate record for each volume, would it be possible to augment the urns with some sort of sequence index, e.g. urn:cts:greekLit:tlg0081.tlg001.opp-grc4-1?

urn:cts:greekLit:tlg0081.tlg001.opp-grc8
urn:cts:greekLit:tlg0081.tlg001.opp-grc4
urn:cts:greekLit:tlg0081.tlg001.opp-eng4
urn:cts:greekLit:tlg0074.tlg001.opp-grc2
urn:cts:greekLit:tlg0074.tlg001.opp-eng4
urn:cts:greekLit:tlg0060.tlg001.opp-grc4
urn:cts:greekLit:tlg0059.tlg034.opp-grc2
urn:cts:greekLit:tlg0032.tlg007.opp-grc3
urn:cts:greekLit:tlg0032.tlg007.opp-eng2
urn:cts:greekLit:tlg0032.tlg006.opp-grc7
urn:cts:greekLit:tlg0032.tlg006.opp-eng5
urn:cts:greekLit:tlg0032.tlg001.opp-grc2
urn:cts:greekLit:tlg0032.tlg001.opp-eng2
urn:cts:greekLit:tlg0016.tlg001.opp-grc4
urn:cts:greekLit:tlg0016.tlg001.opp-grc2
urn:cts:greekLit:tlg0016.tlg001.opp-eng2
urn:cts:greekLit:tlg0016.tlg001.opp-eng1
urn:cts:greekLit:tlg0012.tlg002.opp-grc3
urn:cts:greekLit:tlg0012.tlg002.opp-grc2
urn:cts:greekLit:tlg0012.tlg001.opp-grc5
urn:cts:greekLit:tlg0012.tlg001.opp-grc4
urn:cts:greekLit:tlg0008.tlg001.opp-grc6
urn:cts:greekLit:tlg0008.tlg001.opp-grc11
urn:cts:greekLit:tlg0008.tlg001.opp-eng5
urn:cts:greekLit:tlg0008.tlg001.opp-eng4
urn:cts:greekLit:tlg0007.tlg112.opp-grc2
urn:cts:greekLit:tlg0007.tlg112.opp-eng2
urn:cts:greekLit:tlg0007.tlg082b.opp-grc2
urn:cts:greekLit:tlg0007.tlg080.perseus-grc1
urn:cts:greekLit:tlg0003.tlg001.opp-lat8
urn:cts:greekLit:tlg0003.tlg001.opp-lat2
urn:cts:greekLit:tlg0003.tlg001.opp-lat18
urn:cts:greekLit:tlg0003.tlg001.opp-lat17
urn:cts:greekLit:tlg0003.tlg001.opp-grc9
urn:cts:greekLit:tlg0003.tlg001.opp-grc80
urn:cts:greekLit:tlg0003.tlg001.opp-grc76
urn:cts:greekLit:tlg0003.tlg001.opp-grc74
urn:cts:greekLit:tlg0003.tlg001.opp-grc70
urn:cts:greekLit:tlg0003.tlg001.opp-grc69
urn:cts:greekLit:tlg0003.tlg001.opp-grc65
urn:cts:greekLit:tlg0003.tlg001.opp-grc62
urn:cts:greekLit:tlg0003.tlg001.opp-grc60
urn:cts:greekLit:tlg0003.tlg001.opp-grc6
urn:cts:greekLit:tlg0003.tlg001.opp-grc54
urn:cts:greekLit:tlg0003.tlg001.opp-grc51
urn:cts:greekLit:tlg0003.tlg001.opp-grc49
urn:cts:greekLit:tlg0003.tlg001.opp-grc48
urn:cts:greekLit:tlg0003.tlg001.opp-grc43
urn:cts:greekLit:tlg0003.tlg001.opp-grc36
urn:cts:greekLit:tlg0003.tlg001.opp-grc31
urn:cts:greekLit:tlg0003.tlg001.opp-grc28
urn:cts:greekLit:tlg0003.tlg001.opp-grc26
urn:cts:greekLit:tlg0003.tlg001.opp-grc17
urn:cts:greekLit:tlg0003.tlg001.opp-grc1
urn:cts:greekLit:tlg0003.tlg001.opp-ger4
urn:cts:greekLit:tlg0003.tlg001.opp-eng6
urn:cts:greekLit:tlg0003.tlg001.opp-eng4
urn:cts:greekLit:tlg0003.tlg001.opp-eng12
urn:cts:greekLit:tlg0003.tlg001.opp- grc1
urn:cts:greekLit:fhg0405.fhg001.opp-lat1
urn:cts:greekLit:fhg0397.fhg001.opp-lat1
urn:cts:greekLang:tlg7000.tlg001.perseus-grc4
urn:cts:greekLit:tlg1347.tlg002.opp-lat1
urn:cts:greekLit:tlg1343.tlg002.opp-lat1
urn:cts:greekLit:tlg1337.tlg003.opp-ara1
urn:cts:greekLit:tlg1337.tlg002.opp-ara1
urn:cts:greekLit:tlg1337.tlg001.opp-ara2
urn:cts:greekLit:tlg1337.tlg001.opp-ara1
urn:cts:greekLit:tlg1328.tlg001.opp-lat1
urn:cts:greekLit:tlg1308.tlg002.opp-lat1
urn:cts:greekLit:tlg1305.tlg002.opp-lat1
urn:cts:greekLit:tlg0744.tlg003.opp-grc1
urn:cts:greekLit:tlg0744.tlg003.opp-ger1
urn:cts:greekLit:tlg0638.tlg001.opp-eng1
urn:cts:greekLit:tlg0612.tlg001.opp-grc1
urn:cts:greekLit:tlg0557.tlg001.opp-grc1
urn:cts:greekLit:tlg0557.tlg001.opp-eng2
urn:cts:greekLit:tlg0550.tlg001.opp-lat1
urn:cts:greekLit:tlg0550.tlg001.opp-grc1
urn:cts:greekLit:tlg0542.tlg001.opp-grc1
urn:cts:greekLit:tlg0525.tlg001.opp-grc6
urn:cts:greekLit:tlg0385.tlg001.opp-grc7
urn:cts:greekLit:tlg0385.tlg001.opp-grc6
urn:cts:greekLit:tlg0385.tlg001.opp-grc4
urn:cts:greekLit:tlg0385.tlg001.opp-eng7
urn:cts:greekLit:tlg0385.tlg001.opp-eng15
urn:cts:greekLit:tlg0363.tlg014.opp-grc2
urn:cts:greekLit:tlg0363.tlg001.opp-grc1
urn:cts:greekLit:tlg0363.tlg001.opp-ger1
urn:cts:greekLit:tlg0099.tlg001.opp-grc8
urn:cts:greekLit:tlg0099.tlg001.opp-grc10
urn:cts:greekLit:tlg0099.tlg001.opp-eng11
urn:cts:greekLit:tlg0093.tlg001.opp-grc2
urn:cts:greekLit:tlg0093.tlg001.opp-eng1
urn:cts:greekLit:tlg1799.tlg001.opp-grc3
urn:cts:greekLit:tlg1799.tlg001.opp-lat3
urn:cts:greekLit:tlg1896.tlg002.opp-lat1
urn:cts:greekLit:tlg1901.tlg001.opp-lat1
urn:cts:greekLit:tlg2000.tlg001.opp-grc2
urn:cts:greekLit:tlg2000.tlg001.opp-grc3
urn:cts:greekLang:tlg7000.tlg001.perseus-grc5
urn:cts:greekLit:tlg2018.tlg001.opp-grc1
urn:cts:greekLit:tlg2018.tlg002.opp-eng1
urn:cts:greekLit:tlg2018.tlg002.opp-grc3
urn:cts:greekLit:tlg2032.tlg001.opp-grc4
urn:cts:greekLit:tlg2032.tlg001.opp-lat3
urn:cts:greekLit:tlg2034.tlg014.opp-grc1
urn:cts:greekLit:tlg2037.tlg001.opp-grc1
urn:cts:greekLit:tlg2037.tlg001.opp-grc2
urn:cts:greekLit:tlg2045.tlg001.opp-eng1
urn:cts:greekLit:tlg2045.tlg001.opp-grc1
urn:cts:greekLit:tlg2045.tlg001.opp-grc4
urn:cts:greekLit:tlg2230.tlg001.opp-lat1
urn:cts:greekLit:tlg2249.tlg001.opp-lat1
urn:cts:greekLit:tlg2280.tlg002.opp-lat1
urn:cts:greekLit:tlg2281.tlg001.opp-lat1
urn:cts:greekLit:tlg2289.tlg002.opp-lat1
urn:cts:greekLit:tlg2308.tlg001.opp-lat1
urn:cts:greekLit:tlg2328.tlg003.opp-lat1
urn:cts:greekLit:tlg2434.tlg003.opp-lat1
urn:cts:greekLit:tlg2511.tlg001.opp-lat1
urn:cts:greekLit:tlg2539.tlg003.opp-lat1
urn:cts:greekLit:tlg3135.tlg001.opp-grc3
urn:cts:greekLit:tlg3135.tlg002.opp-grc1
urn:cts:greekLit:tlg4015.tlg009.opp-grc1
urn:cts:greekLit:tlg4029.tlg001.opp-grc2
urn:cts:greekLit:tlg4029.tlg002.perseus-grc1
urn:cts:greekLit:tlg4040.tlg030.opp-grc1
urn:cts:greekLit:tlg4040.tlg030.opp-grc4
urn:cts:greekLit:tlg4040.tlg032.opp-grc1
urn:cts:greekLit:tlg4040.tlg032.opp-grc4
urn:cts:greekLit:tlg9010.tlg001.opp-grc2

AlisonBabeu commented 6 years ago

Hi @cwulfman I'm pretty sure we could alter the URN in various ways, including as you suggest. These multi-volume MODS records grew out of a consolidation effort. Even though all of the MODS records within a modsCollection record contain the same URN, they are still all technically the same edition, since they all have only one URN. In CTS essentially, one URN equals one edition. I had originally wanted to keep all the MODS records for each edition because they each contain specific information about the volume such as links to GoogleBooks and other online editions, as well as part information about the work (e.g. Books III-V of the Odyssey)

The bigger problem has been that I had to redirect a number of URNs during the mass consolidation and for this reason, in the long term, we should problem consider deprecating all of the current URNs and renumbering the editions under various authors from scratch.

For example, if you look at the record for Thucydides Historiae, there are actually only about 30 to 40 editions cataloged but they all have very random URNs, due to the redirecting and consolidation.

BTW, there are also a number of multi-volume editions in catalog_pending too.

cwulfman commented 6 years ago

Here are two re-workings for you to take a look at , @AlisonBabeu . I think MODS does a poor job distinguishing between logical and physical structure and therefore dealing with multi-volume works, but I'm not a cataloguer so I may be missing something. The records attached express the entire edition as a single mods item with constituents for the physical volumes. I think I'd prefer to express logical structure in the MODS (e.g., the 8 books of Thucydides' work), but that's coming "top-down" from the work to the physical items, and I do understand that cataloguers need to deal with things "bottom up" (from the object in their hands). This is where METS becomes useful.

Thoughts?

tlg0003.tlg001.opp-grc1a.mods1.xml.zip tlg0003.tlg001.opp-grc1.mods1.xml.zip

AlisonBabeu commented 6 years ago

Hi @cwulfman. I really like both of those examples, they provide a very elegant solution and contain all of the relevant details for each individual volume, namely online links, TOC with work part data, etc. The original solution of MODS consolidation was largely developed as a way to quickly aggregate individual records and have only one URN per edition.

One question, I noticed that you attached Thucydides name to his VIAF identifier.

  <name>
    <nameIdentifier type="viaf">46144928073854340420</nameIdentifier>
    <namePart>Thucydides</namePart>
    <role>
      <roleTerm authority="marcrelator" type="code">cre</roleTerm>
    </role>
  </name>
  <name>

whereas in previous Greek Anthology files author names had been identified in the following way using the TLG or other ID, and in many cases these authors also do have VIAFs.

<name type="personal">
          <nameIdentifier type="tlg">2123</nameIdentifier>
          <displayForm lang="la">palladas</displayForm>
          <role>
            <roleTerm>cre</roleTerm>
          </role>
        </name>

Any major reasons for the change or is our way of identifying authors still in flux I assume.

cwulfman commented 6 years ago

I like the displayForm solution much better!

cwulfman commented 6 years ago

@AlisonBabeu , thinking more about this: what do you think about making the nameIdentifier type citeurn? That ties the MODS and MADS records together better.

In fact, it's vital: by doing that, you can look up an author's works by searching all the MODS for the citeurn in the MADs.

AlisonBabeu commented 6 years ago

I would be ok with trying that out but I'm having some trouble conceptualizing it entirely. So the CITEURN from an individual authority record/textgroup would then also be found in all of the MODS records for that author/textgroup as well? No matter what else we do, I do also want to keep the same textgroup identifiers for the CTS-URNs, however, since if we stopped using all of those identifiers, our workflow would suddenly be seriously out of synch with the OGL Project.

cwulfman commented 6 years ago

Something has to tie the works, editions, authors, and text groups together, right? If one knows the citeurn of an author, one ought to be able, for example, to execute a relational query (SQL, XQuery) to find all the works for which the (or an) author is that author. A simplified example:

collection('/db/PerseusCatalogData')//mods:mods[//mods:nameIdentifier[@type='citeurn'] = 'urn:cite:perseus:author.1403.1']

This should retrieve all the texts by Thucydides.

AlisonBabeu commented 6 years ago

Well this would certainly alleviate the issue of creating records for authors with no canonical identifier for them, even though I might still need to do that to accommodate other workflows already in place. Simpler aggregation is definitely needed though.

cwulfman commented 6 years ago

I've created a new issue #121 to carry on this discussion about the nameIdentifier element.

Meanwhile, I've converted all those modsCollections with duplicates of the cts urn into mods records with a single cts urn and constituents. I've pushed these changes to development; take a look and tell me what you think.

AlisonBabeu commented 6 years ago

Hi @cwulfman, I'm really pleased with this result, it is a much simpler approach. So will whatever algorithmic approach you've taken also be able to do this with the modsCollections files in catalog_pending I assume.

And one other thought on these records, I've noticed in the first Thucydides example: urn:cts:greekLit:tlg0003.tlg001.opp-eng12, that the title for the entire record displays as follows:

 <titleInfo xml:lang="en" type="uniform">
      <title>Histories</title>
    </titleInfo>
   <titleInfo>
      <nonSort>The</nonSort>
      <title>history of the Grecian war, in eight books</title>
      <partNumber>Vol I</partNumber>
</titleInfo>

even though this is the title for just the first volume, since this title is then repeated in the first <relatedItem type="constituent"> section, it can be very confusing for the user. I'm assuming this title is used because is it the CiteCollection label for the entire CTS-URN in the Cite_Collection tables. I've documented this display issue before and was wondering if there was some way to perhaps just display the uniform title at the top part of the MODS record. Am I making any sense?

AlisonBabeu commented 6 years ago

Sorry about the confusing comment above @cwulfman. I forgot to close off my XML section and the rest of the comment ended up in the "XML". Do I need to report the comment or do you mind scrolling to the right? :)

cwulfman commented 6 years ago

(I fixed the comment: you don't need the string "xml" after the ``` quote marks).

You'll almost certainly want to review all those converted records the makes sure the titleinfo in the main record really is the uniform title (or whatever the proper title for the work as a whole really is). I grabbed this title from the first mods record in the collection.

And yes: we can apply the same method to catalog_pending!

cwulfman commented 6 years ago

I've merged these into master and pushed to GitHub. Once we've reviewed those uniform titles, we can close this ticket.

AlisonBabeu commented 6 years ago

I will start digging through the uniform titles tomorrow @cwulfman

AlisonBabeu commented 6 years ago

hi @cwulfman, as I start to dig through these files I realize that there is a little bit of information that got left behind in the original MODS records that I would like to capture when we use this method for catalog_pending, so it may need a bit of tweaking.

As I started to check the first uniform title in this record

I realized that the new MODS record no longer contained the series information anywhere, and there was unique series information in each volume. For example:

 <mods:relatedItem xmlns="http://www.loc.gov/mods/v3" type="series">
<mods:titleInfo>
<mods:title>Loeb classical library</mods:title>
    <mods:partNumber> Volume 319</mods:partNumber>
</mods:titleInfo>
 </mods:relatedItem>

The top level aggregation also only contains the publication date for the first volume <mods:dateIssued>1937</mods:dateIssued>, which could be confusing to users, because in the case of the example I've used, the seven volumes were published between 1937 and 1950. I apologize that I didn't notice or think about this type of data when I first reviewed the replacements for the modsCollection files.

Since this current commit is only about a 130 records, for these I think I am reasonably content to add the missing information back in by hand, but moving forward with this method could we include this information as well in the aggregation.

PerseusDL / catalog_data

multiple MODS records with same cts urn #115