Closed sarahjeansweeney closed 8 years ago
^ what I've come up with so far for a "preview" page on the MODS loader
This looks great so far. I have a few questions/suggestions:
Is the XML editable? I think we talked about this yesterday but somehow my memory is fuzzy.
Initial pass at mods diff
Yes, the load start and depositor names changes are helpful. Preview looks great!
@sarahjeansweeney for fields with authorities - how do you want to handle those? will they be another field in the spreadsheet? then we can just assign them like we assign any other field value.
@sarahjeansweeney for the "place of publication" field in the spreadsheet, we currently have two fields under place - one for city and one for state (both have the field name placeTerm but with different attributes)
t.place(path: 'place', namespace_prefix: 'mods'){
t.city_term(path: 'placeTerm', namespace_prefix: 'mods', attributes: { type: 'text' })
t.state_term(path: 'placeTerm', namespace_prefix: 'mods', attributes: { type: 'code', authority: 'marccountry' })
}
I am wondering if we "made up" this distinction with city and state with different attributes? Would you like to continue this method? If so, I would advocate for making city and state as separate fields in the spreadsheet. Parsing based on the comma would be unreliable in the case where there is no comma and how would we know if it is city or state.
@sarahjeansweeney what does the "reformatting quality" field in the spreadsheet map to in mods?
re: authorities: This just came up the other day, and it was decided we would add a new column to the spreadsheet so the catalogers could explicitly state the value's authority.
re: place: As far as I can tell, the place of publication value shouldn't need to be split into city and state. It should map to the originInfo/place/placeTerm element, which doesn't have subelements. subject/hierarchicalGeographic terms do split into city, state, country, etc subelements.
@elizoller re: reformattingQuality is a subelement of physicalDescription, i.e.:
<mods:physicalDescription>
<mods:extent>1 postcard : 9 x 14.2 cm.</mods:extent>
<mods:digitalOrigin>reformatted digital</mods:digitalOrigin>
<mods:reformattingQuality>access</mods:reformattingQuality>
</mods:physicalDescription>
It only has three allowed values: access, preservation, replacement
Ok, thanks. On further investigation of the city_term/state_term thing, the only place in the code that seems to be using this functionality is the iptc loaders which pass iptc city and state values (iptc stores them separately). Should I merge those into a single string and store it as a single placeTerm in the mods? (And keep the mods spreadsheet handling with a single place field)
@elizoller yes, for the originInfo placeTerm field they should be in a single string. the spreadsheet will stay the same.
for table of contents are we just doing top level element like
<mods:tableOfContents>text goes here</mods:tableOfContents>
Yup, table of contents is pretty simple.
Thinking about user experience for all our new loaders, let's organize them based on what the user is doing, not by uploaded file type:
Metadata Overwrite Tool
New File Loader
I'm running into a system error when I try to use the spreadsheet loader on staging:
I'll bet a very large drink with a tiny umbrella in it, it's because it's Thursday. I'll restart some things.
FWIW it was working earlier.
Staging should be fixed now.
Two spreadsheet fields aren't being processed into MODS: Supplied title ("Is this a supplied title?") and any of the name affiliation fields. I'll send the spreadsheet over slack, but the resulting record is here: http://cerberus.library.northeastern.edu/files/neu:nz806157s
WHEN YOU DEPLOY: Run console loop to mark all previous load reports as completed = true (default value is false)
I tested the upload spreadsheet + file upload process this morning and there are a few issues with how the MODS is being generated.
<category>miscellany</category>
, which isn't a valid SO category.I didn't test a full MODS spreadsheet, just what lives in the Board of Trustees spreadsheet, but I'll try that next to make sure it wasn't just how the spreadsheet was formatted. I'll also share the spreadsheet I used over slack.
Here are the records I created, if that helps: http://cerberus.library.northeastern.edu/files/neu:nz8062499 http://cerberus.library.northeastern.edu/files/neu:nz8062545 http://cerberus.library.northeastern.edu/files/neu:nz806252m
@elizoller do you mean all previous load reports?
Yeah sorry
Just tried another load. The preview page displayed the metadata as expected, with relatedItems and good dates, etc:
But when the record loaded, it loaded with the same MODS issues described above (http://cerberus.library.northeastern.edu/files/neu:nz806286f):
Probably caching Sarah, which will be a quick patch. I'll manually expunge it to see if it works
Getting this message now:
I think this is because there is an empty column header. I will do a better check for that.
Just tested again and noticed a few other things: Personal name: The given name for a personal name creator field was inserted in the MODS valueURI attribute: became
<mods:name type="personal" valueURI="Sarah">
<mods:namePart type="given">Sarah</mods:namePart>
<mods:namePart type="family">Sweeney</mods:namePart>
<mods:namePart/>
<mods:role>
<mods:roleTerm authority="marcrelator" authorityURI="http://id.loc.gov/vocabulary/relators" type="text">Creator</mods:roleTerm>
</mods:role>
<mods:namePart type="termsOfAddress">Ms.</mods:namePart>
<mods:namePart type="date">1923 -</mods:namePart>
</mods:name>
Corporate name: I may not have formatted this correctly, but the role and the URIs are missing from corporate name fields: Became:
<mods:name type="corporate" usage="primary">
<mods:namePart>Northeastern University (Boston, Mass.). Board of Trustees</mods:namePart>
</mods:name>
Subject Topic: The authority value and URI were inserted in the same attribute:
Became
<mods:subject authority="lcsh | URI">
<mods:topic>College trustees</mods:topic>
<mods:topic>Massachusetts</mods:topic>
<mods:topic>Boston</mods:topic>
</mods:subject>
Subject Name: The field value for subject name was inserted into the ValueURI attribute, with a \ to escape the apostrophe:
Became
<mods:subject>
<mods:name type="corporate" authority="lcsh" valueURI="Boston Young Men\'s Christian Association">
<mods:namePart>Boston Young Men's Christian Association</mods:namePart>
</mods:name>
</mods:subject>
Let's use "topical subject heading" and "name subject heading" for consistency in the subject column headers.
Fixes are in for the above issues with URIs and deployed to staging (e04af55d5cf6425061bd30d650c1fc6cf030e10c)
Here's the grouper group for loaders: northeastern:drs:repository:loaders:spreadsheet
We'll sort out the full permissions for all the loaders later.
Should be finished with ca6ca2a5fb66610f6c805b537d06d6b55d45fff2
Spreadsheet loader for just metadata: Archives staff are digitizing materials and depositing directly into the DRS (see https://repository.library.northeastern.edu/collections/neu:rx913r06d and https://repository.library.northeastern.edu/collections/neu:rx913v50k). After they upload the file they enter metadata into a Google form, which populates a Google spreadsheet. We will need a loader to process the spreadsheet, create metadata records, and replace the original stub records with the new metadata.
Spreadsheet loader for metadata and files: Same as above, but the files will be loaded along with the metadata.