PerseusDL / perseus_catalog

http://catalog.perseus.org
2 stars 4 forks source link

Authority Records Showing Both Canonical IDs and Multiple IDs of Related Works #62

Open AlisonBabeu opened 6 years ago

AlisonBabeu commented 6 years ago

While looking through author records I noticed an interesting if incorrect phenomenon. Take for example the record for Homer  image

The list of works includes not only Homer's works with canonical ID tlg0012 (Iliad, Odyssey, Epigrammata), it looks like it has also captured a number related commentary works, which in one way is awesome, but at the same time, is very confusing because they are all listed together as works by Homer, not works about Homer.

And in fact, the “Full Record” for the third work in the list illustrated above, Fragmenta (a work by Aristarchus, tlg1767.tlg001), it would appear that the TLG IDs with displayLabels “isScholiaTo” from this MODS record are in fact being captured, along with the canonical work ID tlg1767.tlg001 and its ctsurn.

image

I happened to check this individual MODS record in Github and it doesn’t appear that the Homer IDs have been commented out as I thought most had.

 <mods:identifier type="tlg">1767.001</mods:identifier>
                <mods:identifier displayLabel="isScholiaTo" type="tlg">0012.001</mods:identifier>
<mods:identifier displayLabel="isScholiaTo" type="tlg">0012.002</mods:identifier>

This seems to be a widespread issue, (I found the same thing with several works for Cicero, for example his list of works includes Pro Milone by Asconius, a commentary on Cicero’s For Milo urn:cts:latinLit:phi0803.phi003.opp-lat2).

I don’t remember this happening in the last iteration or perhaps I didn’t find it last time?



cwulfman commented 6 years ago

@AlisonBabeu These are the documents with uncommented <mods:identifier displayLabel='isScholiaTo'> elements:

  1. tlg4093.tlg007.opp-grc1.mods1.xml
  2. tlg5022.tlg007.opp-grc1.mods1.xml
  3. tlg5022.tlg006.opp-grc1.mods1.xml
  4. tlg5022.tlg005.opp-grc1.mods1.xml
  5. tlg5022.tlg004.opp-grc1.mods1.xml
  6. tlg5022.tlg003.opp-grc1.mods1.xml
  7. tlg5022.tlg002.opp-grc1.mods1.xml
  8. tlg5022.tlg001.opp-grc1.mods1.xml
  9. tlg5031.tlg002.opp-grc1.mods1.xml
  10. tlg5031.tlg001.opp-grc1.mods1.xml
  11. tlg1767.tlg001.opp-grc1.mods1.xml
  12. tlg5019.tlg001.opp-grc1.mods1.xml
  13. tlg5019.tlg002.opp-grc1.mods1.xml

The <mods:extension> element might be the place to put these relations, as some sort of RDF. (Of course, that's what the FRBRoo data would do.)

AlisonBabeu commented 6 years ago

It also is happening with Cicero as I mentioned before, which has an uncommented "displayLabel="isCommentaryOn" (http://174.138.78.35:8080/exist/apps/PerseusCatalog/versions.html?id=urn:cts:latinLit:phi0803.phi003.opp-lat2). I have a feeling that the current experimental interface has done this with all IDs. And you are quite right this is something that FRBRoo data will do, I guess the question for the moment, is what should I do now to fix this?

cwulfman commented 6 years ago

Yes -- the EUI (experimental user interface -- we need a better name now) isn't filtering ids in these situations. It could -- it would be wrong, but it could....

Here are all the display labels. Perhaps you want me to filter out all and any ids with displayLabels?

  1. isScholiaTo
  2. IsParaphraseOf
  3. isTranslationOf
  4. isCommentaryOn
  5. isAdaptationOf
  6. ap Galenum
  7. ap Oribasius
  8. ap Paulus
  9. ap Aetium (lib 1-3)
  10. ap Aetium (lib 5,6,8)
  11. ap Aetium (lib 11)
  12. ap Aetium (lib 12)
  13. ap Aetium (lib. 16)
  14. isParaphraseOf
  15. isAttributedTo
  16. isLexiconTo
  17. IsTranslationOf?
  18. isQuotedBy
  19. isSummaryOf
  20. LC Permalink
  21. Corpus Iuris Civilis
  22. IsAdaptationOf
  23. isEpitomeOf
  24. Book IV-VIII
  25. FragmentIn
  26. IsTranslationOf
  27. Book I-III
  28. Digitized SLUB edition
  29. Dithyrambs
AlisonBabeu commented 6 years ago

Well I believe I told you that I once suggested, with little favorable response, that we name the catalog Fred. I need to think on the ID question and filtering for a bit though.

AlisonBabeu commented 6 years ago

Hi @cwulfman thanks again for the list. You don't need to filter the ap Galenum to ap Aetium, in that those displayLabels aren't affecting the accuracy of the canonical IDs in question. LC Permalink also should be o.k, though ususally it is only a displayLabel on urls not identifiers. I just wish there was a good way to keep a list of filtered identifiers for an author, and split the list into works by them and works about them, but obviously that is for the next iteration with more sophisticated relationship data.