ErwinKomen / RU-passim

0 stars 0 forks source link

HUWA import: process JSON manuscripts #534

Closed ErwinKomen closed 1 year ago

ErwinKomen commented 2 years ago

Part of issue #530

Import the manuscripts from the HUWA-generated JSON file

  1. Make sure that the correct manuscript JSON is generated from Huwa (see issue #532)
  2. Adapt the existing 'import manuscript JSON' function to process this Huwa manuscript JSON
ErwinKomen commented 2 years ago

Questions that need answering before importing

  1. If a library doesn't exist (yet) in Passim, but is specified in the HUWA (json) with a valid country, city and library name, should that library be created in PASSIM?
  2. Does issue #557 still go: do not create the link from a sermon to the SSG (=AF) yet?
  3. If a (new HUWA) manuscript contains a reference to a Clavis/Gryson that is not yet existing, may the necessary items be created to process it: a new Signature entry, a new SermonGold entry, and a link from that SG to the correct SSG (=AF)?
MennaRempt commented 2 years ago
  1. Yes, see excel.
  2. The link from sermon to SSG should be made through the field inhalt in HUWA, since this is where the manuscript contents are described. Opera with only one link to inhalt, should not be linked to an AF, but just imported as sermon manifestations.
  3. Yes!

-- Erwin Okay, just double-checking point (2) about when a link from sermon to SSG may be made. One example that makes me wonder is this one: a sermon in a manuscript, and this sermon only occurs one time in inhalt, yet its signatures (as linked to inhalt) are pointing to a SG and SSG that are already 'on board'. But since the opera only occurs once in inhalt, still not make the link to the SSG?

{
  "id": 11,
  "idno": "279",
  "handschrift_id": 15,
  "support": "",
  "extent": "",
  "format": "",
  "library_id": null,
  "library": "Biblioth\u00e8que municipale",
  "lcity": "Tours",
  "lcountry": "France",
  "msitems": [
    {
      "order": 1,
      "parent": "",
      "firstchild": "",
      "next": 2,
      "sermon": {
        "type": "Plain",
        "locus": "v94-r96",
        "author": "",
        "author_id": null,
        "title": "",
        "incipit": "",
        "explicit": "",
        "note": "",
        "keywords": [ "HUWA" ],
        "datasets": [ "HUWA_manuscripts" ],
        "signaturesA": [
          { "editype": "cl", "code": "CPL 284.60", "gold": 1461, "ssg": 1326 },
          { "editype": "gr", "code": "AU s 60", "gold": 1461, "ssg": 1326 }
        ],
        "opera_id": 5741,
        "manu_count": 1
      }
    },

Response: true, act as shown above. Don't make an SSG link in this kind of situation!

ErwinKomen commented 2 years ago

Notes for the implementation

  1. As for libraries:
    1. The libraries that could be added have already been added via an approval process
    2. Remaining libraries: do not add any of them automatically, since they either lack an element (name, city, country) or their name differs from what it should be (see separate Excel)
    3. Make double sure that only libraries are 'selected', where the lcountry, lcity and library name are all three the same as the library inside Passim (Menna's note of Oct 5/2022)
  2. When a Sermon is imported from JSON, double check the manu_count variable.
    1. If it is just 1, then do not create an SG on the basis of it, nor make a link with a corresponding SSG
  3. If a (new HUWA) manuscript contains a reference to a Clavis/Gryson that is not yet existing: create the necessary items
    1. a new Signature entry,
    2. a new SermonGold entry,
    3. a link from that SG to the correct SSG (=AF)
    4. Make sure that the Project in the newly created Signature / SG is set correctly (does that have a project link?)
ErwinKomen commented 2 years ago

Small fixes

These are some small fixes that were met when importing

  1. Manuscript.custom_add(): adding to notes must not use self but obj
  2. Right now the JSON has the imported sermons added to dataset HUWA_manuscripts, while importe manuscripts are not added to a dataset. This is a fix for issue #532
    1. The datasets for sermons should be HUWA_sermons - done
    2. The datasets for manuscripts should be HUWA_manuscripts - done
  3. Some HUWA manuscripts are defined multiple times (i.e. same shelfmark, multiple entries in table handschrift)
    1. Signal whether a Huwa manuscript is taken up in Passim
    2. Add them all, but add a remark in the debug output - done: whole list is printed in one go in the output
ErwinKomen commented 2 years ago

Action points for JSON-to-DB process

For the process of importing a JSON Manuscript into Passim, some of the matters above lead to action points. This is from issue #532

  1. Process Manuscript.provenances into the appropriate Codico - done
  2. Process Manuscript.scribeinfo into the appropriate Codico - done
  3. Process Manuscript.scriptinfo into the appropriate Codico - done

Some more point(s) from issue #596:

  1. Process nebenwerk for those SermonDescr records that have been read from the opera table, and that have an entry in table nebenwerk (see issue #596)

Some more point(s) from issue #532:

  1. Process codico_name - added
  2. Process codico_notes - added
  3. Process siglen at the manuscript level
    1. The information in it will automatically be incorporated into the manuscript's raw field, and it can be visualized from there

Other TODO's

  1. Visualize siglen informatin from raw into the details view - done
    1. This is equivalent to visualizing the edinote field in EqualGold edit view - done
ErwinKomen commented 1 year ago

More action points

Upon verification of the process, there are some more things that need to be dealt with.

  1. Attributed author: is not added, where it should be.
    1. E.g. from manuscript Cod. 12 H, sermon 74, the author should be Augustinus, especially since author_id is given. However, no author is assigned in the process of reading.
    2. First occurrance in test set is sermon 57, author Cyprianus, manuscript Cod. 14920-22
    3. Remedeed in issue #532
  2. The second problem is, that autor_inms seems to be ignored: the author assigned to 'all' items of a manuscript is not taken over (this should be 'overruled' by autor_opera for any specific opera that has an author specified there). This is a problem that should be resolved first in issue #532
    1. This has now been remedeed.
  3. The title of both Manuscript and Codico turned out to be "SUPPLY A NAME", which is not really desirable
    1. Adapted the code to supply a name to M and C + added the needed codico.save() call ... :)
ErwinKomen commented 1 year ago

Other points to be aware of

It is possible that two manuscripts result with the same or almost the same idno (identifier), e.g: image

Note the last two manuscripts. The same item is ultimately in view. But the top item Cod. 641 (with two spaces) is from HUWA and the bottom item cod. 641 (with one space) is from SERMONES.

ErwinKomen commented 1 year ago

Okay, just checking, but look at opera 1478 that is part of handschrift 8: image

The field abk is not used, nor are the bemerkungen here. Double check that with the HUWA to passim Excel with directions...

The fields above should first of all be processed in making the JSON, as described in issue #532

Okay, at our side (importing from JSON) it means:

ErwinKomen commented 1 year ago

All issues above have been addressed. Perhaps this is enough...