Closed ErwinKomen closed 1 year ago
Below is an example of Huwa inhalt.
0
, but the first section has 16 items linked to it. What should be done with the first section?von_bis
and bis_f
need to be interpreted?
3
, there are numbers after the period - what does that point to?inhalt
that have an opera id 0
. That means these are not connected with a sermon (manifestation) within the Huwa database. What should be done with these entries?EqualGoldHuwaToJson
huwa_passim_library.json
manuscripts
ManuscriptHuwaToJson
, which uses EqualGoldHuwaToJson
, but with import_type set to manu
EqualGoldHuwaToJson
, in the get_data()
methodQuestions:
There is no handschrift with id
0
, but the first section has 16 items linked to it. What should be done with the first section?How do the fields
von_bis
andbis_f
need to be interpreted?
'von_bis' should be the page/folio number where the manifestation starts and 'bis_f' the page/filio number that the manifestation ends. (although there are some very strange numbers here, like negative numbers, so we have to ask CW for additional explanation). NB. for the locus, the fields 'von_bis' + 'von_rv' have to be combined to form the start folio/page number, and the fields 'bis_f' + 'bis_rv' have to be combined to form the end page/folio number.
- It looks like the number before the period is the folio number / page number?
- But then for Handschrift
3
, there are numbers after the period - what does that point to?
The numbers after the period could potentially indicate the line number that the manifestation starts, but this does not seem to be compatible with what i see when I scroll down the list in the database... We have to ask for additional explanation here, too.
Handschrift no. 3 is strange anyway, since it has no shelfmark and links to an empty library record (no name), which links to an empty location record (no name, not linked to a country). We'll ask CW what to do with this record, exactly.
- There are quite a few records in
inhalt
that have an opera id0
. That means these are not connected with a sermon (manifestation) within the Huwa database. What should be done with these entries?
These should be imported as sermon manifestations, but not linked to AFs.
- There are quite a few records in
inhalt
that have an opera id0
. That means these are not connected with a sermon (manifestation) within the Huwa database. What should be done with these entries?These should be imported as sermon manifestations, but not linked to AFs.
EK: I'm not sure how that works. If opera id is 0
, then there is no matching opera record, hence no sermon manifestation specification. So would you like me to just add 'empty' sermon manifestations for those situations?
- There are quite a few records in
inhalt
that have an opera id0
. That means these are not connected with a sermon (manifestation) within the Huwa database. What should be done with these entries?These should be imported as sermon manifestations, but not linked to AFs.
EK: I'm not sure how that works. If opera id is
0
, then there is no matching opera record, hence no sermon manifestation specification. So would you like me to just add 'empty' sermon manifestations for those situations?
No, as sermon manifestations with the information found in the table inhalt (i.e. locus) and in the other tables linked to inhalt:
The document "HUWA-mapping_01.xlsx" (in the wrkgrp folder, data > HUWA database) gives the corresponding PASSIM fields for the HUWA ones.
(responding to the above): I see now. Sorry for the confusion! Importing manuscripts works a bit differently...
EqualGoldExternal
. This record holds the opera
id (as field externalid
) as well as the SSG/AF id (as field equal
.id)inhalt
record might e.g. contain a reference to opera id 143. We should then check whether EqualGoldExternal
contains a link to an SSG/AF for this externalid
of 143
. And then we should make a link from the newly imported sermon manifestation to the corresponding SSG/AF (via table SermonDescrEqual
).Okay, picking this up again...
get_locus()
to convert the von/bis fields into a Passim locus string (21/sep/2022)inhalt
: (21/sep/2022)
inms
for the field inhalt
to see if it matches thereid
field of the found record in inms
and look it up in autor_inms
to find the autor
numberautor
number is the id
of the 'gold' author table autor
huwa_passim_author.json
contains some entries where there are two HUWA author id's for one Passim id. But note that there is little to no ambiguity...handschrift
about the Codex: (21/sep/2022)
support
: use field material
extent
: use fields fol_pag
, folbl
, vors_vorne
, vors_hinten
, col
, col_breite
, zeilen
format
: use fields format
, hs_breite
, schrift_hoehe
, schrift_breite
MsItem
(21/sep/2022)MsItem
list gets into the manuscript
entry (21/sep/2022)ssglinks: []
, filling in the actual Passim SSG link id's that can be taken up. But since I'm doing it right now, it means that it will only include those SSGs, that have so far been determined (i.e. from HUWA). There still is a backlog of SSGs to be extracted from the HUWA data.ssglinks
parameter, but instead filled in information at the signaturesA
: those are the signatures of the SGs that point to SSGs that need to be connected with a particular sermon.
ManuscriptUploadJson
, where Manuscript.custom_add()
is being called with these data, and then SermonDescr.custom_add()
in turn, which then calls custom_set()
for the signaturesA
part.get_opera_signatures()
, and I've now added signaturesA
to the output per sermonStatistics of transforming the HUWA manuscripts (with sermons) into JSON: Item | Count |
---|---|
Manuscripts | 8444 |
Sermons | 57961 |
So, in principle, all of the above is working. Closing this issue now...
Okay, from issue #534, there is one thing that should be done at the level of making a Manuscript JSON:
Opera with only one link to inhalt, should not be linked to an AF
That is to say, when there is:
opera
is referred to only once from inhalt
(e.g. 11670) - this means that there is only one handschrift
with this opera"manu_count":
nnmanu_count
being 1 or higher than 1.Answers from CW to the questions above.
The fields for the folio numbers (von_bis, bis_f, von_rv and bis_rv) sometimes have strange contents, such as negative numbers and decimals. How should this be interpreted?
There is no handschrift with ID 0 in the list of manuscripts, but it does exist in the table inhalt, with 16 linked items. Do you know what manuscript this can be, or is this a HUWA-internal test?
Handschrift ID 3 has no shelfmark and links to an empty library record (no name), which links to an empty location record (no name, not linked to a country). It has a note "nicht löschen". What should we do with this during the import?
Processing the CW responses from Sep/29. Note that this is in reader/views.py EqualGoldHuwaToJson
method get_data()
Ignore (do not import) handschrift
with id 0
if handschrift_id == 0: continue
line in the Handschrift loopSermon ordering within inhalt
order
of the sermon within the manuscript
lst_inhalt
in python on: von_bis
lst_inhalt = sorted(lst_inhalt, key=lambda x: x['von_bis'])
von_bis
and in bis_f
):
-110
I
, II
, III
, IV
etc. (those refer to the vorderdeckel)Action if a handschrift
has bibliothek = 2
and/or it has signatur
empty:
bibliothek == 2
, don't read. Otherwise: do read (ignore presence/absence of signatur
)Read it again (5/oct) with latest counts:
date read | manuscripts | sermons |
---|---|---|
19/sep/2022 | 8444 | 57961 |
5/oct/2022 | 8444 | 57928 |
The above works well now.
Fix needed (see issue #534):
HUWA_sermons
HUWA_manuscripts
Leafing through the HUWA db tables (also looking at HUWA-mapping_01.xlsx
), I recognize that there are some more pieces of information that might be / should be put into JSON and then imported.
title
of a sermon is in table tit
(which links to inhalt
; there could be multiple titles per sermon)date
of a manuscript/codico should be specified in the JSON at the level of the manuscript, where the field date
should receive the contents of HUWA table annus
(on the basis of handschrift
). Note that any characters that are not numeric and not a hyphen should be removed from this field.Okay, need to be concise here. How is each HUWA field, intended for a Passim Manuscript (including Codico, SermonDescr and what have you) going to be processed?
These are the Passim tables involved:
Manuscript
ManuscriptExternal
- external URL associated with this manuscriptManuscriptKeyword
- keywords associated with this manuscriptProvenanceMan
- provenance for a particular manuscript (better: see codico)LitrefMan
- literature reference for this particular manuscript (including page numbers)CollectionMan
- a collection of manuscripts. (probably not used in HUWA?) Codico
Daterange
- start and finish year of a daterange for a codicological unit (incl ref and pages of that ref)CodicoKeyword
- any keywords specific for a particular codicological unitProvenanceCod
- provenance for this particular codicological unitOriginCod
- any number of origins that need to be associated with this codicological unitMsItem
(just for order
and hierarchy)
SermonDescr
Range
- a range of bible references for a particular sermonBibRange
- a range of bible references for a particular sermon (difference with previous?)
BibVerse
- one or more verses from the bibrangeSermonDescrKeyword
- any keywords belonging to a particular sermonCollectionSerm
- collection(s) in which a sermon is.Parent | Model | HUWA relevant tables | Status |
---|---|---|---|
none | Manuscript |
bibliothek , cla , col_bem , fasc ,ff_bem , format_bem , handschrift |
partly processed |
Manuscript |
ManuscriptExternal |
- | - |
Manuscript |
ManuscriptKeyword |
- | - |
Manuscript |
ProvenanceMan |
- | - |
Manuscript |
LitrefMan |
- | - |
Manuscript |
CollectionMan |
- | - |
Manuscript |
Codico |
- | - |
Codico |
Daterange |
start, finish: annus |
ok |
Codico |
CodicoKeyword |
- | - |
Codico |
ProvenanceCod |
herkunft_besitzer |
Added to JSON output |
Codico |
OriginCod |
- | - |
Codico |
MsItem |
- | - |
MsItem |
SermonDescr |
- | - |
SermonDescr |
BibRange |
- | - |
SermonDescr |
SermonDescrKeyword |
- | - |
SermonDescr |
CollectionSerm |
- | - |
BibRange |
BibVerse |
- | - |
Manuscript
cla
: unclear what to do with itcol_bem
: unclear what to do with itfasc
: unclear what to do with thisliteratur
: this must be done manually + needs input into ZoteroSermonDescr
(or EqualGold
)
huwa
: no clear instruction what to do with thisidentifik
: no clear instruction what to do with thisinfine
: no clear instructionsEqualGold
:
nebenwerk
Manuscript
and Codico
ff_bem.name
as 'folia comment: ...' to Codico.extent
- doneformat_bem.name
as format comment: ...' to
Codico.format` - doneherkunft_besitzer
into Codico.provenances
- added to JSON outpuths_notiz
information from fields text
and bemerkungen
to Manuscript.notes
- doneschreiber
information to new field Codico.scribe
- doneschrift
information to new field Codico.script
- doneFor the process of importing a JSON Manuscript into Passim, some of the matters above lead to action points. This is for issue #534
Manuscript.provenances
into the appropriate Codico
Manuscript.scribeinfo
into the appropriate Codico
Manuscript.scriptinfo
into the appropriate Codico
archiv
autor_editionen
bearbeiter
bhl
bhm
bloomfield
collectiones
, collinhalt
, col_bem
- these could be processed separately?datentraeger
edenda
editionen
fasc
faszikel
handschrift_archiv
hilfe
huwa
indiculum
infine
katalog_inhalt
, katalog_name
links
literatur
, literatur_archiv
loci
mat_bem
nebenwerk
personen
, personen_publikationen
, publikationen
reihe
, reihealt
retr
, rubriken
saec_bem
schoenberger
, stegmueller
siglen
, siglen_edd
thll
user
verfasser
, verfasser_literatur
verkn
verlage
zeilen_bem
zweitsignatur
Okay, for correct processing one more step is needed:
opera_id
, add this:
[ { "externalid": (opera_id), "externaltype": "huwop" } ]
But hang on: that step had already been implemented. It's just that no JSON had been produced yet where this popped up.
Well, a good thing! But now we have one: passim_huwa_manu_20221121.json
Some more processing is needed, taken over from issue #596
col_bem
: separate from collectiones and collinhalt: gives remarks on the number of columns a manuscript has. Add in Cod. Unit: extent.
fasc
, faszikel
: these are both different ways to indicate numbered codicological units. If possible, for items in handschrift that are either connected to fasc or have an entry in the faszikel column, add one codicological unit for each connection, ordered as the number in fasc or faszikel indicated (if that is not possible, with as the cod. Unit name fasc_name or faszikel_name).
infine
: manifestation: postscriptum (infine_text)
mat_bem
: notes on the material of the manuscript: add in codicological unit: support.
support
for the manuscriptsaec_bem
: remarks on the date of manuscripts. Please add this in codicological unit: notes.
codico_notes
siglen
, siglen_edd
: give information about which manuscripts were used for critical editions (siglen) and which older editions were used for critical editions (siglen_edd). This information should be added to the manuscripts in the form of a note “Used for [edition reference]”.
siglen
information will now be added, it will be imported into Manuscript as part of the raw
information. That means this needs to be interpreted correctly for issue #534siglen_edd
to the siglen
information, using the editionen
identifier as the crucial element to 'bind' them together.zeilen_bem
: remarks on the field ‘zeilen’ in the table handschrift; has to be added in codicological unit: extent following the information in zeilen
.
get_extent()
. If there is any note on the 'zeilen', it is now added as (note: this is a note)
after the information on 'zeilen' proper.zweitsignatur
: old shelfmarks. Add these in the manuscript details: notes as “Old shelfmark(s): [name]”.
notes
to the manuscriptUnclear how to process fasc
, fasc_name
as codicological order number in handschrift 195
, since fasc_name
just iterates between 0
, 2
and 3
.
Pragmatic solution: just add the strings fasc_name and/or faszikel_name to the codicological title.
This adds element codico_name
to the Manuscript JSON object
I've now added the contents of tables [literatur] and [editionen] into tables in the reader
app.
The table Literatur
(without 'e' at the end) also contains the contents of Bloomfield, Stegmueller, Shoenberger etc.
This means that references to editions can now be made to the table reader.Edition
We are now working in reader/views.py, class EqualGoldHuwaToJson
, method get_data()
.
Look for import_type == "manu"
within that method.
This class is the base class for ManuscriptHuwaToJson
, with url name manuscript_huwajson
, which is the download Huwa Manuscripts: json
that is callable from the Manuscript Listview.
So, remember, once the JSON data has been made in-line with the HUWA data, and downloaded correctly, there is a next step...
And the next step is issue #534, where the Huwa data as added into JSON should be 'read' and processed into even better and more beautiful Manuscript, Codico, MsItem and SermonDescr objects!
siglen
now treated properly?
lat. 13376
has a whole series of edition id's and most of them have a siglen A<SUP>1</SUP>
, but I have no idea whether this is okay or how to verify thisThis manuscript has the following list of siglen as linked to editions (siglen at the end):
279: Morin, Germain, CC SL, 333-336, Corpus Christianorum Series Latina, 103 - A<SUP>1</SUP>
298: SChr, 310-324, -, 243 - A<SUP>1</SUP>
299: SChr, 310-324, -, 243 - A<SUP>1</SUP>
247: Morin, Germain, CC SL, 877-881, Corpus Christianorum Series Latina, 104 - A<SUP>1</SUP>
407: 1968, PLS, 397-400, Patrologiae Latinae supllementum, 4, Paris - A<SUP>1</SUP>
408: Verbraken, Pierre-Patrick, 1961, Rev. Bén., 13-17, Revue Bénedictine, 71 - A<SUP>1</SUP>
7457: Caillau, b12-b13, Sancti Aurelii Augustini Sermones inediti (Operum Suppl. 1-3), II - A{h2h}, vlim, maur, verbr
776: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
784: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
785: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
786: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
787: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
803: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
808: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
811: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
812: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
814: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}
295: SChr, 310-324, -, 243 - A{h1h}
825: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}, caes.
826: Goldbacher, Alois, 1911, CSEL, 380-387, Corpus Scriptorum Ecclesiasticorum Latinorum, 57, Wien - A{h1h}, caes. (p. 144-147)
7616: PL, 513-515, Patrologiae Latinae cursus completus, 102 - A{h2h}
The siglen
provide a sign for the edition
(from table editionen
).
A siglen
item links, on the one hand, to a particular manuscript (called handschrift
) and on the other hand to a particular edition
whose entry connects with a particular opera
(i.e. a passim SermonDescr
).
So the siglen operate at the sermon rather than at the manuscript level
Okay, so right now we've made available the siglen
links (as well as siglen_edd
) in the downloaded manuscript as a list of combinations between editionen
id and the siglen
text.
But actually, if we want to know the literature (including editions) for a particular sermon, we need to look at the Huwa table editionen
(which is now reader.Edition
in Passim) and find all entries for a particular opera
number.
So: instead of taking up the siglen
information into the manuscript json, it may be better to provide this idiosynchratic siglen information right into the reader.Edition
system? That way, we can leave the Manuscript JSON idea 'cleaner'.
huwa_edilit.json
and downloaded it with siglen and siglen_eddSiglen
and SiglenEdd
huwa_edilit.json
and add them into the appropriate places in tables Siglen
and SiglenEdd
Edition
table).As far as issue #534 is concerned (importing the HUWA manuscript JSON into the Passim application), how will the correct editions be showable with this 'new' system, where the edition
information is in Passim table Edition
(and all linked to it)?
SermonDescr
) comes with an external connection to HUWA's original opera
Manuscript
, it also has access to the external manuscript's connection to HUWA's original handschrift
Edition
method get_opera_literature(opera_id, handschrift_id)
Edition
, where the opera_id
is used to find all the literature references for that particular sermon (irrespective of manuscript)siglen
as well as the siglen_edd
information is added for each literature referenceOkay, this then seems to finishe issue #533, and we now turn back again to #534, to see if it all matches...
Well, ahum, when evaluating the findings, there is one little matter coming up: the libraries. E.g.
Laici
is not recognized as such, because the Passim variant has an additional spaceBibliothèque Nationale
is not recognized, because Huwa doesn't add "de France"Didn't we have a library matching Excel?
Well, we have lib_huwa_new_EK.xlsx
(28/sep/2022), which contains some lines as "In Passim als...X", and lots of lines in quotation marks.
We also have huwa_passim_library.json
(19/jul/2022), containing sections:
1 - huwaonly
2 - huwapassim
: which HUWA id belongs to which PASSIM library id
3 - libraries
: all of the libraries with at least passim ID, and where possible the applicable HUWA id
And e.g. for HUWA library 1 (the BNF), there is the corresponding passim id 4814 We actually have issue #567
And later we had a discussion to add some 'new' libraries from Huwa.
But there is one document that did not get the attention it should have, and that is the lib_huwa_new_EK.xlsx
.
This contains a full list of corrections by Menna for some 212 half-done Huwa libraries, translating them into Passim.
I should now process them as part of issue #567...
Okay, the above is nowworking correctly (the library part).
Other left overs:
Result: done
Pseudo-Augustinus
entered into the Author table, while these are not referenced anywhere, and they should not be, since their equivalent is already in the list of authors (Augustinus Hipponensus (pseudo)
. I don't know, cannot see, how this has come into being. I'm also not sure how to remedy this...Add some additional fields from opera
, translating them in corresponding SermonDescr
ones:
opera.abk
: this field doesn't have any specific treatment comments. However, it functions like a sermon title
, so let's put it there in that field.opera.bemerkungen
: this field doesn' thave any comments either. But this is really something that should appear in the Note
field of a sermonopera.opera_langname
: this field is not processed either, but could be assigned to the subtitle
field. In a couple of cases this is just the 'full' version of abk
.All issues above have been addressed. Perhaps this is enough...
Part of issue #530
Convert the manuscripts in the HUWA database to JSON