Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

As a researcher, I want the structured text file and incipits parsed into canonical stories, story instances, and manuscripts and imported into Google Sheets so I can work with the data in a more structured form. #11

Closed rlskoeser closed 4 years ago

rlskoeser commented 4 years ago
WendyLBelcher commented 4 years ago

Just a note to say that the issue of "mss ids like PEth 41.8 and EMIP 601.225" is important. Just in case it is getting lost in the mix. I don't know how to comment field!

rlskoeser commented 4 years ago

@WendyLBelcher it's on my list! I'm sorry I haven't been able to get back to it yet. I started looking at it but discovered I needed to add some tests before I updated my script because I'm handling so many different cases now, and was worried about breaking things I already have working...

rlskoeser commented 4 years ago

Increasing from 3 to 5 pts for the complexity of handling variation in manuscript references and available information.

rlskoeser commented 4 years ago

Problem documented by @elambrinaki on https://github.com/Princeton-CDH/pemm-scripts/issues/29

On the Story Instance sheet, there are repeating rows, which I assume is the result of import of the manuscripts that are separated by ";" with no manuscript name repeated.

For example, on the Story Instance sheet, Canonical Story ID 404 is matched to G-Vatican (BAV) 92 three times. In the primary source, MAC0404 is listed in three different G-Vatican (BAV) manuscripts (GVE 92(7a); 146(69b); 242(23b)). In our Google sheets, instead of having three different rows with G-Vatican (BAV) 92 folio start 7a, G-Vatican (BAV) 146 folio start 69b, and G-Vatican (BAV) folio start 242 (23b), we have three identical rows (G-Vatican (BAV) 92 7a).

The same thing with this story and CR-Paris (BNF) 52. In the primary source, there are three manuscripts (CRA 52-91; 52(12a); 55(10b)), so the import should result in 1) CR-Paris (BNF) 52, miracle number 81, 2) CR-Paris (BNF) 52, folio start 12a, 3) CR-Paris (BNF) 55, folio start 10b.

WendyLBelcher commented 4 years ago
rlskoeser commented 4 years ago

Revised to correct the repository mapping and correct logic for repeating manuscripts when manuscript name/repository does not repeat.

WendyLBelcher commented 4 years ago

I believe this Issue can be closed.