Princeton-CDH / pemm-scripts

scripts & tools for the Princeton Ethiopian Miracles of Mary project
Apache License 2.0
1 stars 0 forks source link

As a researcher, I want to apply schema and data validation to an existing sheet so I can import data as a starting point. #17

Closed thatbudakguy closed 4 years ago

thatbudakguy commented 4 years ago

dev notes

as a consequence of switching to the "bound script" model (see #13 for more details), we will want to convert all of the spreadsheet setup logic to work with the assumption of an existing sheet, potentially already populated with data, in a non-destructive way.

testing notes

to test this, have a look at the "issue 17 testing" folder that has been created in the PEMM google drive, inside "development & design". you'll see two files:

we want to test two primary things:

note that we are not testing the actual schema in this issue. if the naming, placement, or data in some fields looks wrong, it probably is - that will be fixed and tested on a different issue. we're only testing the two things listed above.

instructions

follow the steps below. if at any point you can't complete the step or something looks wrong, leave a comment below (with a screenshot, if applicable) explaining the issue. feel free to comment or slack with questions as well!

  1. open the PEMM spreadsheet. this spreadsheet was created just for testing this issue; it's okay to edit the data in it or delete it.
  2. notice that, after a moment, a new menu appears at the end of the toolbar (after "Help"). it should be called "PEMM". you may see a "Running script" notification appear for a second before the menu appears.
  3. open the menu and check that three options appear: "set up active sheet", "set up all sheets", and "set up validation".
  4. make sure that the spreadsheet is empty - there should be only one sheet with the default name ("Sheet1") and no data yet.
  5. let's import the Manuscript data. choose file -> import and browse to the 2020-02-10-manuscripts.csv file. it's in the PEMM drive, under "development and design", in the "issue 17 testing" folder.
  6. when you see the dialog appear after selecting the file, choose "Insert new sheet(s)" for "import location". leave the rest as-is, and click "import data".
  7. a new sheet will be created. click on the small triangle next to its name and choose "rename". rename the sheet Manuscript (singular!). this is very important - the script can't find sheets unless their name matches the schema exactly.
  8. let's try applying our schema to this sheet. from the new PEMM menu, choose "set up active sheet". you may get a dialog from google about granting permissions: if so, click "continue" and follow the steps to log in.
  9. watch as the script updates the headers in the sheet. when the green "Running script" dialog goes away, it's finished.
  10. notice that we now have two rows of headers, since the data in the manuscript sheet had headers already. this is somewhat by design: you can sanity check the data by making sure that the headers in the second row match the new ones in the first row. if anything is off, your data doesn't match the schema!
  11. if the headers are identical, go ahead and select the second row of headers (the unstyled one) by clicking the 2 number of the row on the left and then right-clicking (or ctrl+clicking on mac) the row and choosing "delete row".

continue by repeating steps 5-11 with the other two data files in this folder, 2020-02-10-story_instances.csv and 2020-02-10-canonical_stories.csv, making sure their sheet names are changed to "Story Instance" and "Canonical Story". when you're finished, you can delete the default empty sheet "Sheet1".

  1. let's now apply our schema to all the sheets. from the PEMM menu, choose "set up all sheets". this one will take a bit longer. you can switch to the other sheets to watch it in action. remember that it isn't finished until the green "running script" message goes away.
  2. notice that we got two new sheets - collection and story origin were created for us. the "set up all sheets" option will create any sheets that you don't yet have, which means you can use it to create the entire spreadsheet without any data if you want. if you do have data, it will leave the data alone.
  3. check the story instance, canonical story, and manuscript sheets and repeat step 10, checking to make sure that the headers in row 2 match the headers in row 1. delete the extra headers when finished.
  4. notice that we now have some red triangles on some cells: data validation has been applied. the "set up all sheets" options automatically applies validation to all sheets when it's finished. note that "set up active sheet" does not do this - validation can't be applied until all sheets are set up because some sheets depend on each other for values.

if you got through everything, congratulations!! leave any final comments and close the issue.

thatbudakguy commented 4 years ago

based on conversation with @rlskoeser today:

WendyLBelcher commented 4 years ago

I'm testing. I imported the Manuscripts sheet. The heads in that sheet have two matching problems. Here is a screenshot. First, Total Stories in the second row is the correct wording of the head, but it should appear under what is called Number of Stories in the first row. Second, the Note column in the second row should appear in the Manuscripts sheet. Every sheet should have a Note field. Screenshot testing Feb 19 2020

WendyLBelcher commented 4 years ago

I imported the Story Instance sheet. One head doesn't match, and the wording in a few cases is off. First, Order of Miracle in the first row should be worded as it is in the second row: Miracle Number. Second, Macomber incipit in the second row should appear in the first row, and Note in the second row should appear in the first row. Third, Print Versions in the first row should be deleted; it appears on the Canonical Story sheet (where it should be). Screenshot testing Feb 19 2020 Story

WendyLBelcher commented 4 years ago

I imported the Canonical Story Instance sheet. We need the Incipit and Incipt Source from the first row, and the English Translation | Print Version | Notes of the second row. Screenshot testing Feb 19 2020 canonical

WendyLBelcher commented 4 years ago

I did not do steps 12-15, as I assumed we needed to correct the matching. One concern I have is that I corrected the heads based on my memory of what should happen. I should really go and check the meeting logs for decisions on them.

rlskoeser commented 4 years ago

I did notice that the header on the manuscripts sheet was duplicated when I ran set up all sheets — is that expected behavior, and a result of our choice to keep things simple? I think I'm ok with it, it doesn't seem like a step we will do a lot and deleting the extra row doesn't seem like a big problem, but I was curious.

I'll run through this again when I have revised CSV data and hand off for Wendy to do a final review. (I'll leave the duplicate header)

rlskoeser commented 4 years ago

@thatbudakguy should a simplified version of these setup notes be added to the readme?

thatbudakguy commented 4 years ago

@rlskoeser it's a good question. my instinct is that they sort of apply directly enough to the project data that they aren't really instructions for setting up "local dev", and so don't need to be in the README. we could maybe add a note that, once you create a spreadsheet, you could proceed to import data and then run some of these functions.

thatbudakguy commented 4 years ago

maybe they're more like DEPLOYNOTES?

rlskoeser commented 4 years ago

@thatbudakguy I like that idea — that way we can document the process for converting the macomber text file to csv and setting up the initial google sheets doc with validation.

rlskoeser commented 4 years ago

@thatbudakguy I think the longitude validation rule got lost in the refactor — it's defined, but I don't see it being applied anywhere.

Other than that, I think the functionality for setting up the sheets and applying validation is working well and I'm willing to sign off once the missing validation is fixed (unless you want to make the case for folding that into other data model changes).

thatbudakguy commented 4 years ago

@rlskoeser just confirmed that this was the case. I've re-added the application line.

rlskoeser commented 4 years ago

Applying validation via the new menu is working well. The problem I had with not being prompted to sign seems to have been a transitory outage yesterday, it's working fine today.