PerseusDL / catalog_data

MODS and MADS data for the Perseus Catalog
13 stars 12 forks source link

multiple MODS records with same cts urn #115

Open cwulfman opened 6 years ago

cwulfman commented 6 years ago

The following CTS urns are assigned to multiple MODS records: looks like these are multi-volume works, each volume of which has the same urn. That's problematic for several reasons, the most immediate having to do with processing editions: e.g., the Teubner edition of Antiquitates Romanae in five volumes (urn:cts:greekLit:tlg0081.tlg001.opp-grc4) is one edition, not 5. If there must be a separate record for each volume, would it be possible to augment the urns with some sort of sequence index, e.g. urn:cts:greekLit:tlg0081.tlg001.opp-grc4-1?

  1. urn:cts:greekLit:tlg0081.tlg001.opp-grc8
  2. urn:cts:greekLit:tlg0081.tlg001.opp-grc4
  3. urn:cts:greekLit:tlg0081.tlg001.opp-eng4
  4. urn:cts:greekLit:tlg0074.tlg001.opp-grc2
  5. urn:cts:greekLit:tlg0074.tlg001.opp-eng4
  6. urn:cts:greekLit:tlg0060.tlg001.opp-grc4
  7. urn:cts:greekLit:tlg0059.tlg034.opp-grc2
  8. urn:cts:greekLit:tlg0032.tlg007.opp-grc3
  9. urn:cts:greekLit:tlg0032.tlg007.opp-eng2
  10. urn:cts:greekLit:tlg0032.tlg006.opp-grc7
  11. urn:cts:greekLit:tlg0032.tlg006.opp-eng5
  12. urn:cts:greekLit:tlg0032.tlg001.opp-grc2
  13. urn:cts:greekLit:tlg0032.tlg001.opp-eng2
  14. urn:cts:greekLit:tlg0016.tlg001.opp-grc4
  15. urn:cts:greekLit:tlg0016.tlg001.opp-grc2
  16. urn:cts:greekLit:tlg0016.tlg001.opp-eng2
  17. urn:cts:greekLit:tlg0016.tlg001.opp-eng1
  18. urn:cts:greekLit:tlg0012.tlg002.opp-grc3
  19. urn:cts:greekLit:tlg0012.tlg002.opp-grc2
  20. urn:cts:greekLit:tlg0012.tlg001.opp-grc5
  21. urn:cts:greekLit:tlg0012.tlg001.opp-grc4
  22. urn:cts:greekLit:tlg0008.tlg001.opp-grc6
  23. urn:cts:greekLit:tlg0008.tlg001.opp-grc11
  24. urn:cts:greekLit:tlg0008.tlg001.opp-eng5
  25. urn:cts:greekLit:tlg0008.tlg001.opp-eng4
  26. urn:cts:greekLit:tlg0007.tlg112.opp-grc2
  27. urn:cts:greekLit:tlg0007.tlg112.opp-eng2
  28. urn:cts:greekLit:tlg0007.tlg082b.opp-grc2
  29. urn:cts:greekLit:tlg0007.tlg080.perseus-grc1
  30. urn:cts:greekLit:tlg0003.tlg001.opp-lat8
  31. urn:cts:greekLit:tlg0003.tlg001.opp-lat2
  32. urn:cts:greekLit:tlg0003.tlg001.opp-lat18
  33. urn:cts:greekLit:tlg0003.tlg001.opp-lat17
  34. urn:cts:greekLit:tlg0003.tlg001.opp-grc9
  35. urn:cts:greekLit:tlg0003.tlg001.opp-grc80
  36. urn:cts:greekLit:tlg0003.tlg001.opp-grc76
  37. urn:cts:greekLit:tlg0003.tlg001.opp-grc74
  38. urn:cts:greekLit:tlg0003.tlg001.opp-grc70
  39. urn:cts:greekLit:tlg0003.tlg001.opp-grc69
  40. urn:cts:greekLit:tlg0003.tlg001.opp-grc65
  41. urn:cts:greekLit:tlg0003.tlg001.opp-grc62
  42. urn:cts:greekLit:tlg0003.tlg001.opp-grc60
  43. urn:cts:greekLit:tlg0003.tlg001.opp-grc6
  44. urn:cts:greekLit:tlg0003.tlg001.opp-grc54
  45. urn:cts:greekLit:tlg0003.tlg001.opp-grc51
  46. urn:cts:greekLit:tlg0003.tlg001.opp-grc49
  47. urn:cts:greekLit:tlg0003.tlg001.opp-grc48
  48. urn:cts:greekLit:tlg0003.tlg001.opp-grc43
  49. urn:cts:greekLit:tlg0003.tlg001.opp-grc36
  50. urn:cts:greekLit:tlg0003.tlg001.opp-grc31
  51. urn:cts:greekLit:tlg0003.tlg001.opp-grc28
  52. urn:cts:greekLit:tlg0003.tlg001.opp-grc26
  53. urn:cts:greekLit:tlg0003.tlg001.opp-grc17
  54. urn:cts:greekLit:tlg0003.tlg001.opp-grc1
  55. urn:cts:greekLit:tlg0003.tlg001.opp-ger4
  56. urn:cts:greekLit:tlg0003.tlg001.opp-eng6
  57. urn:cts:greekLit:tlg0003.tlg001.opp-eng4
  58. urn:cts:greekLit:tlg0003.tlg001.opp-eng12
  59. urn:cts:greekLit:tlg0003.tlg001.opp- grc1
  60. urn:cts:greekLit:fhg0405.fhg001.opp-lat1
  61. urn:cts:greekLit:fhg0397.fhg001.opp-lat1
  62. urn:cts:greekLang:tlg7000.tlg001.perseus-grc4
  63. urn:cts:greekLit:tlg1347.tlg002.opp-lat1
  64. urn:cts:greekLit:tlg1343.tlg002.opp-lat1
  65. urn:cts:greekLit:tlg1337.tlg003.opp-ara1
  66. urn:cts:greekLit:tlg1337.tlg002.opp-ara1
  67. urn:cts:greekLit:tlg1337.tlg001.opp-ara2
  68. urn:cts:greekLit:tlg1337.tlg001.opp-ara1
  69. urn:cts:greekLit:tlg1328.tlg001.opp-lat1
  70. urn:cts:greekLit:tlg1308.tlg002.opp-lat1
  71. urn:cts:greekLit:tlg1305.tlg002.opp-lat1
  72. urn:cts:greekLit:tlg0744.tlg003.opp-grc1
  73. urn:cts:greekLit:tlg0744.tlg003.opp-ger1
  74. urn:cts:greekLit:tlg0638.tlg001.opp-eng1
  75. urn:cts:greekLit:tlg0612.tlg001.opp-grc1
  76. urn:cts:greekLit:tlg0557.tlg001.opp-grc1
  77. urn:cts:greekLit:tlg0557.tlg001.opp-eng2
  78. urn:cts:greekLit:tlg0550.tlg001.opp-lat1
  79. urn:cts:greekLit:tlg0550.tlg001.opp-grc1
  80. urn:cts:greekLit:tlg0542.tlg001.opp-grc1
  81. urn:cts:greekLit:tlg0525.tlg001.opp-grc6
  82. urn:cts:greekLit:tlg0385.tlg001.opp-grc7
  83. urn:cts:greekLit:tlg0385.tlg001.opp-grc6
  84. urn:cts:greekLit:tlg0385.tlg001.opp-grc4
  85. urn:cts:greekLit:tlg0385.tlg001.opp-eng7
  86. urn:cts:greekLit:tlg0385.tlg001.opp-eng15
  87. urn:cts:greekLit:tlg0363.tlg014.opp-grc2
  88. urn:cts:greekLit:tlg0363.tlg001.opp-grc1
  89. urn:cts:greekLit:tlg0363.tlg001.opp-ger1
  90. urn:cts:greekLit:tlg0099.tlg001.opp-grc8
  91. urn:cts:greekLit:tlg0099.tlg001.opp-grc10
  92. urn:cts:greekLit:tlg0099.tlg001.opp-eng11
  93. urn:cts:greekLit:tlg0093.tlg001.opp-grc2
  94. urn:cts:greekLit:tlg0093.tlg001.opp-eng1
  95. urn:cts:greekLit:tlg1799.tlg001.opp-grc3
  96. urn:cts:greekLit:tlg1799.tlg001.opp-lat3
  97. urn:cts:greekLit:tlg1896.tlg002.opp-lat1
  98. urn:cts:greekLit:tlg1901.tlg001.opp-lat1
  99. urn:cts:greekLit:tlg2000.tlg001.opp-grc2
  100. urn:cts:greekLit:tlg2000.tlg001.opp-grc3
  101. urn:cts:greekLang:tlg7000.tlg001.perseus-grc5
  102. urn:cts:greekLit:tlg2018.tlg001.opp-grc1
  103. urn:cts:greekLit:tlg2018.tlg002.opp-eng1
  104. urn:cts:greekLit:tlg2018.tlg002.opp-grc3
  105. urn:cts:greekLit:tlg2032.tlg001.opp-grc4
  106. urn:cts:greekLit:tlg2032.tlg001.opp-lat3
  107. urn:cts:greekLit:tlg2034.tlg014.opp-grc1
  108. urn:cts:greekLit:tlg2037.tlg001.opp-grc1
  109. urn:cts:greekLit:tlg2037.tlg001.opp-grc2
  110. urn:cts:greekLit:tlg2045.tlg001.opp-eng1
  111. urn:cts:greekLit:tlg2045.tlg001.opp-grc1
  112. urn:cts:greekLit:tlg2045.tlg001.opp-grc4
  113. urn:cts:greekLit:tlg2230.tlg001.opp-lat1
  114. urn:cts:greekLit:tlg2249.tlg001.opp-lat1
  115. urn:cts:greekLit:tlg2280.tlg002.opp-lat1
  116. urn:cts:greekLit:tlg2281.tlg001.opp-lat1
  117. urn:cts:greekLit:tlg2289.tlg002.opp-lat1
  118. urn:cts:greekLit:tlg2308.tlg001.opp-lat1
  119. urn:cts:greekLit:tlg2328.tlg003.opp-lat1
  120. urn:cts:greekLit:tlg2434.tlg003.opp-lat1
  121. urn:cts:greekLit:tlg2511.tlg001.opp-lat1
  122. urn:cts:greekLit:tlg2539.tlg003.opp-lat1
  123. urn:cts:greekLit:tlg3135.tlg001.opp-grc3
  124. urn:cts:greekLit:tlg3135.tlg002.opp-grc1
  125. urn:cts:greekLit:tlg4015.tlg009.opp-grc1
  126. urn:cts:greekLit:tlg4029.tlg001.opp-grc2
  127. urn:cts:greekLit:tlg4029.tlg002.perseus-grc1
  128. urn:cts:greekLit:tlg4040.tlg030.opp-grc1
  129. urn:cts:greekLit:tlg4040.tlg030.opp-grc4
  130. urn:cts:greekLit:tlg4040.tlg032.opp-grc1
  131. urn:cts:greekLit:tlg4040.tlg032.opp-grc4
  132. urn:cts:greekLit:tlg9010.tlg001.opp-grc2
AlisonBabeu commented 6 years ago

Hi @cwulfman I'm pretty sure we could alter the URN in various ways, including as you suggest. These multi-volume MODS records grew out of a consolidation effort. Even though all of the MODS records within a modsCollection record contain the same URN, they are still all technically the same edition, since they all have only one URN. In CTS essentially, one URN equals one edition. I had originally wanted to keep all the MODS records for each edition because they each contain specific information about the volume such as links to GoogleBooks and other online editions, as well as part information about the work (e.g. Books III-V of the Odyssey)

The bigger problem has been that I had to redirect a number of URNs during the mass consolidation and for this reason, in the long term, we should problem consider deprecating all of the current URNs and renumbering the editions under various authors from scratch.

For example, if you look at the record for Thucydides Historiae, there are actually only about 30 to 40 editions cataloged but they all have very random URNs, due to the redirecting and consolidation.

BTW, there are also a number of multi-volume editions in catalog_pending too.

cwulfman commented 6 years ago

Here are two re-workings for you to take a look at , @AlisonBabeu . I think MODS does a poor job distinguishing between logical and physical structure and therefore dealing with multi-volume works, but I'm not a cataloguer so I may be missing something. The records attached express the entire edition as a single mods item with constituents for the physical volumes. I think I'd prefer to express logical structure in the MODS (e.g., the 8 books of Thucydides' work), but that's coming "top-down" from the work to the physical items, and I do understand that cataloguers need to deal with things "bottom up" (from the object in their hands). This is where METS becomes useful.

Thoughts?

tlg0003.tlg001.opp-grc1a.mods1.xml.zip tlg0003.tlg001.opp-grc1.mods1.xml.zip

AlisonBabeu commented 6 years ago

Hi @cwulfman. I really like both of those examples, they provide a very elegant solution and contain all of the relevant details for each individual volume, namely online links, TOC with work part data, etc. The original solution of MODS consolidation was largely developed as a way to quickly aggregate individual records and have only one URN per edition.

One question, I noticed that you attached Thucydides name to his VIAF identifier.

  <name>
    <nameIdentifier type="viaf">46144928073854340420</nameIdentifier>
    <namePart>Thucydides</namePart>
    <role>
      <roleTerm authority="marcrelator" type="code">cre</roleTerm>
    </role>
  </name>
  <name>

whereas in previous Greek Anthology files author names had been identified in the following way using the TLG or other ID, and in many cases these authors also do have VIAFs.

<name type="personal">
          <nameIdentifier type="tlg">2123</nameIdentifier>
          <displayForm lang="la">palladas</displayForm>
          <role>
            <roleTerm>cre</roleTerm>
          </role>
        </name>

Any major reasons for the change or is our way of identifying authors still in flux I assume.

cwulfman commented 6 years ago

I like the displayForm solution much better!

cwulfman commented 6 years ago

@AlisonBabeu , thinking more about this: what do you think about making the nameIdentifier type citeurn? That ties the MODS and MADS records together better.

In fact, it's vital: by doing that, you can look up an author's works by searching all the MODS for the citeurn in the MADs.

AlisonBabeu commented 6 years ago

I would be ok with trying that out but I'm having some trouble conceptualizing it entirely. So the CITEURN from an individual authority record/textgroup would then also be found in all of the MODS records for that author/textgroup as well? No matter what else we do, I do also want to keep the same textgroup identifiers for the CTS-URNs, however, since if we stopped using all of those identifiers, our workflow would suddenly be seriously out of synch with the OGL Project.

cwulfman commented 6 years ago

Something has to tie the works, editions, authors, and text groups together, right? If one knows the citeurn of an author, one ought to be able, for example, to execute a relational query (SQL, XQuery) to find all the works for which the (or an) author is that author. A simplified example:

collection('/db/PerseusCatalogData')//mods:mods[//mods:nameIdentifier[@type='citeurn'] = 'urn:cite:perseus:author.1403.1']

This should retrieve all the texts by Thucydides.

AlisonBabeu commented 6 years ago

Well this would certainly alleviate the issue of creating records for authors with no canonical identifier for them, even though I might still need to do that to accommodate other workflows already in place. Simpler aggregation is definitely needed though.

cwulfman commented 6 years ago

I've created a new issue #121 to carry on this discussion about the nameIdentifier element.

Meanwhile, I've converted all those modsCollections with duplicates of the cts urn into mods records with a single cts urn and constituents. I've pushed these changes to development; take a look and tell me what you think.

AlisonBabeu commented 6 years ago

Hi @cwulfman, I'm really pleased with this result, it is a much simpler approach. So will whatever algorithmic approach you've taken also be able to do this with the modsCollections files in catalog_pending I assume.

And one other thought on these records, I've noticed in the first Thucydides example: urn:cts:greekLit:tlg0003.tlg001.opp-eng12, that the title for the entire record displays as follows:

 <titleInfo xml:lang="en" type="uniform">
      <title>Histories</title>
    </titleInfo>
   <titleInfo>
      <nonSort>The</nonSort>
      <title>history of the Grecian war, in eight books</title>
      <partNumber>Vol I</partNumber>
</titleInfo>

even though this is the title for just the first volume, since this title is then repeated in the first <relatedItem type="constituent"> section, it can be very confusing for the user. I'm assuming this title is used because is it the CiteCollection label for the entire CTS-URN in the Cite_Collection tables. I've documented this display issue before and was wondering if there was some way to perhaps just display the uniform title at the top part of the MODS record. Am I making any sense?

AlisonBabeu commented 6 years ago

Sorry about the confusing comment above @cwulfman. I forgot to close off my XML section and the rest of the comment ended up in the "XML". Do I need to report the comment or do you mind scrolling to the right? :)

cwulfman commented 6 years ago

(I fixed the comment: you don't need the string "xml" after the ``` quote marks).

You'll almost certainly want to review all those converted records the makes sure the titleinfo in the main record really is the uniform title (or whatever the proper title for the work as a whole really is). I grabbed this title from the first mods record in the collection.

And yes: we can apply the same method to catalog_pending!

cwulfman commented 6 years ago

I've merged these into master and pushed to GitHub. Once we've reviewed those uniform titles, we can close this ticket.

AlisonBabeu commented 6 years ago

I will start digging through the uniform titles tomorrow @cwulfman

AlisonBabeu commented 6 years ago

hi @cwulfman, as I start to dig through these files I realize that there is a little bit of information that got left behind in the original MODS records that I would like to capture when we use this method for catalog_pending, so it may need a bit of tweaking.

As I started to check the first uniform title in this record

I realized that the new MODS record no longer contained the series information anywhere, and there was unique series information in each volume. For example:

 <mods:relatedItem xmlns="http://www.loc.gov/mods/v3" type="series">
<mods:titleInfo>
<mods:title>Loeb classical library</mods:title>
    <mods:partNumber> Volume 319</mods:partNumber>
</mods:titleInfo>
 </mods:relatedItem>

The top level aggregation also only contains the publication date for the first volume <mods:dateIssued>1937</mods:dateIssued>, which could be confusing to users, because in the case of the example I've used, the seven volumes were published between 1937 and 1950. I apologize that I didn't notice or think about this type of data when I first reviewed the replacements for the modsCollection files.

Since this current commit is only about a 130 records, for these I think I am reasonably content to add the missing information back in by hand, but moving forward with this method could we include this information as well in the aggregation.