Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

spreadsheet field changes and additions #20

Closed rlskoeser closed 4 years ago

rlskoeser commented 4 years ago
rlskoeser commented 4 years ago

Documented based on meeting notes and comments on the current test version of the spreadsheet.

Question: should "other ids" and "other titles" be removed (or renamed)? They strike me as not useful; I think we will want specific fields for specific ids and titles.

WendyLBelcher commented 4 years ago

Seems to be a little glitch in the Manuscripts sheet. It has a missing head in the second row for the column Total Stories. It says "0" instead of "Total Stories".

WendyLBelcher commented 4 years ago

Re Manuscripts sheet, the first column in the second row is titled "Collection ID". But, the title in the first row is "Name." Neither is the right title. This column needs to be renamed "MS Name" or "Manuscript Name. My suggestion of a new column title was based on my own confusion just now on what information was in that column. I know that it is the Manuscript sheet, so we should know Name is Manuscript Name, but a cue at the beginning of that column on what sheet one is in would be really helpful.

WendyLBelcher commented 4 years ago

Re Manuscript sheet, but in general also, I don't like when fields with numbers sort prioritizing the first number and not its actual meaning. Thus, in column "ID" we get this nonsense order: 1 10 ... 19 2 20 and so on . Can we (1) number manuscripts starting with 0 or (2) change how those fields sort? The second seems possible because the "Macomber ID" in Canonical Sheet does not have the nonsense order.

rlskoeser commented 4 years ago

@WendyLBelcher

WendyLBelcher commented 4 years ago

Re Canonical story, mostly sheet looks good. But, the Story Origin field doesn't seem to be working. And, we will need to import those Hamburg numbers from here. It should be straightforward, just matching them with Macomber numbers and dumping their into Hamburg ID. https://docs.google.com/spreadsheets/u/1/d/1VsCljxWcCa0E4hTKRPnyKuAgb-edXTQKVpaYG6LwVCE/edit#gid=0

WendyLBelcher commented 4 years ago

Re Collection sheet, one title did not come through. "Collection Name" is empty in second row. So, just a tiny problem. No others seen. (Later, I need to get the Lat/Long data and enter it.)

WendyLBelcher commented 4 years ago

Yes, these would be good field labels: Manuscript Name and Manuscript ID

WendyLBelcher commented 4 years ago

Re the 91a, yes, that's garbage, now corrected.

WendyLBelcher commented 4 years ago

By the way, I just created a spreadsheet with the Hamburg IDs, matched up to Macomber IDs. It might be useful to important those Hamburg identifiers (using the Macomber IDS) sooner rather than later. That's because Hamburg has given the miracles helpful names, which may help us with some of the profusion. https://docs.google.com/spreadsheets/d/1VsCljxWcCa0E4hTKRPnyKuAgb-edXTQKVpaYG6LwVCE/edit#gid=0

WendyLBelcher commented 4 years ago

Re the Story Instance sheet, and the Miracle Number and Manuscript Number fields. Collections with dashes and only one number are coming in not quite right (e.g., C-1 or CBS-30 or G-1). Those numbers are the Miracle Number, not the Manuscript Number (which in every case is "1").. So, the following four collections with only one manuscript have to be treated differently: C-, CL-, CBS-, and G-.

To give the long-winded explanation, which you can probably ignore: currently, we are converting G-1 and G-2 and G-22 just as we said to, so that it becomes collection and manuscript number, e.g., C-Veroli (BGV) 1, C-Veroli (BGV) 2, C-Veroli (BGV) 22. However, it turns out that number is not a "Manuscript Number" but a "Miracle Number." (The import helped me because I noticed we had no folio or miracle number, and we have to have one or the other). So, I went back and studied the Macomber pdf and found that in fact, there is only one manuscript. We currently wrongly represent G collection as holding 150 manuscripts. No, it is one manuscript with 150 stories in it. So, I don't know what the script should say but G-1 and G-2 and G-22 should all be imported as "Manuscript Name" = "C-Veroli (BGV)" and the "Manuscript ID" for all of them should be "1" and the numbers (1, 2, and 22) should be entered into "Story Instance Sheet" under the "Miracle Number." Let me know if I can explain that more clearly.

WendyLBelcher commented 4 years ago

Re the Story Instance sheet and various fields with numbers, can we have the following fields sort numerically, not textually: Canonical Story ID Folio Start | Column Start | Line Start | Folio End | Column End | Line End | Miracle Number | Number of Paintings

WendyLBelcher commented 4 years ago

Re the Story Instance sheet and the Folio Start and Folio End fields, I would like to standardize the abbreviations for recto and verso. Right now I sometimes used a/b for them and sometimes used r/v. So to correct that, when importing "Folio Start" and "Folio End," let's convert all "r" to "a" and all "v" to "b". So "6938 (39v); 4618 (120r)" would come in as "6938 (39b); 4618 (120a)".

rlskoeser commented 4 years ago

We discussed standardizing recto/verso and I forgot, thanks for coming back to it and telling me exactly how to convert it.

I don't understand your comment about sorting the story instance sheet, however it's not really practical - the story instance rows are based on the order they occur in the macomber text file, and I'm not sure how practical it would be for me to sort them. It should already be grouped by canonical story id since that's how they occur in the text file.

WendyLBelcher commented 4 years ago

Re the sorting, I used the wrong word. This is the ordering issue we discussed in Slack of numerical versus textual ordering. Does that help? You said "It looks like I'm sorting them as text instead of numbers. From what I can see, they're all numeric except for EMML (HMML) 91a, is that a typo? If it is, then I should be able to update to sort numerically (which agree would be much easier to work with)."

WendyLBelcher commented 4 years ago

Re the Story Instance sheet and Folio End field, we are automatically importing the folio numbers into both the Folio Start and Folio End field. I vaguely remember we might have even said that we should do that in a meeting or something. But, it's not correct. We only do that for three collections, PEth, EMDL, EMIP, and even then only in exceptional cases, when they do not actually have a Folio End number. To give some examples: "EMML: 3872(128b)" has a Folio Start (128b) and no Folio End "VLVE 298(39b)" has a Folio Start (39b) and no Folio End

"PEth: 46.98 (115r-116r)" has a Folio Start (115r) and a Folio End (116r) "PEth: 46.52 (53rv)" has a Folio Start (53r) and a Folio End (53v) "EMIP: 601.278 (160rv)" has a Folio Start (160r) and a Folio End (160v)

The cases where you repeat the Folio Start in the Folio End for PEth, EMDL, or EMIP is where there is only one number, like this: "EMIP: 601.279 (160v)" has a Folio Start (160v) and a Folio End (160v).

WendyLBelcher commented 4 years ago

Re the Story Instance sheet and the Miracle Number field, we need to generate this Miracle Number field when it is absent (through a calculus using the number of Canonical Stories in a manuscript? Or by number of Folio Starts in a particular manuscript?).

WendyLBelcher commented 4 years ago

Re the Story Instance Sheet and the Macomber field, I'm not sure if there is a problem or if my use of sort is causing the problem. I use sort to see patterns, but in Excel, if you sort one column, the rest sort along with it. So, maybe Google Sheets is not like that and when I sort one column, the other stay fixed.

Nevertheless, I think we have a problem with the Macomber IDs and Story Titles not staying attached. For instance, in the Macomber text file, 13 is the Macomber ID for the story titled "The composition of the Miracles of Mary by Bishop Hildephonsus of Toledo." But somehow, 13 is attached instead to "The arrival of the Holy Family in Sāmǝnon," which is in turn attached to dozens of Macomber ID's, including 10, 11, 12, 13, 14, and 15, but also 150 and so on, which should be impossible. It's actual Macomber ID is 1-C. Maybe the C is throwing things off?

rlskoeser commented 4 years ago

@WendyLBelcher thanks for all your careful testing and comments. Some responses.

Regarding your sorting problems: I think it's probably possible do to the sorting you want in Google Sheets, but I don't think it's advisable. Because everyone working on it is looking at the same document, I think sorting it could have unexpected results. When you want to sort and work with the data like that, I recommend you download the Google Sheets document as an Excel file and work with it that way (I checked and it's an option! File -> Download).

rlskoeser commented 4 years ago

Re the Story Instance sheet and various fields with numbers, can we have the following fields sort numerically, not textually: Canonical Story ID Folio Start | Column Start | Line Start | Folio End | Column End | Line End | Miracle Number | Number of Paintings

Actually, I misspoke when I said we could do this — we can do it for some fields, but not for all of them. Canonical story id includes letters in some cases and folio start and end always do, so it's not possible to sort them numerically (at least, not that I know of within google sheets or excel). The others that are strictly numeric can be, but I'm not sure how valuable that is without being able to sort on the others.

WendyLBelcher commented 4 years ago

Re the text file and the collection called LUE and where its numbers go. The first is a manuscript number in the Manuscript sheet, the second is a miracle number in the Story Instance sheet.

You should ignore what follows, this is just for Wendy. Later, I'm going to do some alterations to that number, since the 30 should be 11, so I want to note that here:
LUE 30-40; should become L-Uppsala Ms no 11, miracle number 40 LUE 31-65; should become L-Uppsala Ms no 12, miracle number 65 LUE 32-39; should become L-Uppsala Ms no 15, miracle number 39

WendyLBelcher commented 4 years ago

Nic is working on letting workers adjust column lengths. right now, the main place where it creaqtes a problem for the user is the Canonical Story Title field on the Story Instance sheet. It doesn't show anything when one hover over the cell or clicks on it.

WendyLBelcher commented 4 years ago

Wendy made a mistake in the original data for repositories. I have three collections at the repository of the Bibliothque Nationale. For two collections I gave the repository abbreviation as BN and for one I have BNF. So, I need to change that somewhere so that the repository of the Bibliothque Nationale always appears as BNF.

WendyLBelcher commented 4 years ago

So far as my comments are concerned, you can ignore everything above. What follows are the only things left to do: For Manuscript Sheet:

WendyLBelcher commented 4 years ago

For Canonical Story Sheet:

WendyLBelcher commented 4 years ago

For Collection sheet:

WendyLBelcher commented 4 years ago

For Story Instance sheet:

WendyLBelcher commented 4 years ago

I think this is done. I had one thing here remaining, but I suspect it has been addressed already, so I'm closing. The one thing: Rename column head as "Canonical Story Title" (maybe this was already done?)

rlskoeser commented 4 years ago

@WendyLBelcher I left that unchecked because I couldn't figure out anything that needs to be done — but I guess that means that it has has already been done.

Thanks for closing the issue.