geobtaa / metadata

Processing issues and Handbook for metadata schema and workflow.
https://gin.btaa.org/metadata/
0 stars 0 forks source link

Ingest Sanborn maps from Illinois #188

Closed karenmajewicz closed 1 year ago

karenmajewicz commented 1 year ago

OAI ingest

https://digital.library.illinois.edu/oai-pmh?verb=ListSets

Sanborn:

6ff64b00-072d-0130-c5bb-0019b9e633c5-2 Sanborn Fire Insurance Maps
karenmajewicz commented 1 year ago

working link for a previously submitted set

https://digital.library.illinois.edu/oai-pmh/idhh?verb=ListRecords&set=810d3fa0-e3fb-012f-c5b6-0019b9e633c5-a&metadataPrefix=oai_idhh

karenmajewicz commented 1 year ago

same format for this set not working

https://digital.library.illinois.edu/oai-pmh/idhh?verb=ListRecords&set=6ff64b00-072d-0130-c5bb-0019b9e633c5-2&metadataPrefix=oai_idhh

karenmajewicz commented 1 year ago

The OAI is not returning info via JSON

karenmajewicz commented 1 year ago

working link:

https://digital.library.illinois.edu/oai-pmh/?verb=ListRecords&set=6ff64b00-072d-0130-c5bb-0019b9e633c5-2&metadataPrefix=oai_dc

but only showing 100 items per page

karenmajewicz commented 1 year ago

https://digital.library.illinois.edu/oai-pmh/?verb=ListRecords&resumptionToken=frg:1ss19o55-527q-5685-p0oo-5564o4r188p0-7|fgneg:755|zrgnqngnCersvk:bnv_qp

karenmajewicz commented 1 year ago

resumptiveToken frg:1ss19o55-527q-5685-p0oo-5564o4r188p0-7|fgneg:755|zrgnqngnCersvk:bnv_qp

karenmajewicz commented 1 year ago

illinoisHarvest.csv

karenmajewicz commented 1 year ago

Steps:

Part 1: get the files via oai

  1. Use this OAI-PMH validator tool at https://validator.oaipmh.com
  2. Go to the Download XML tab
  3. Enter the base URL (https://digital.library.illinois.edu/oai-pmh) and the set name (6ff64b00-072d-0130-c5bb-0019b9e633c5-2)
  4. Wait for the app to pull all the XML files and download them (ideally in a ZIP, but sometimes that doesn't work and you need to click on each file)

Part 2: turn the records into a CSV via OpenRefine

  1. start OpenRefine
  2. Choose "Get Data from this Computer" and upload the XML files
  3. From the parsing options, select from the Header "record"

Part 3: Collapse multivalued cells

  1. The multi-valued cells will start out being grouped together by which XML file they came from. We don't want that, so remove the column called File.
  2. Now, they are grouped by a value "http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" Leave this for now.
  3. There are multiple Identifiers (dc:identifier), so select that column, Edit Cells - Join multi-valued cells
  4. Move the Identifier column to the beginning so that items will be grouped by these unique values
  5. Collapse the remaining cells with the same Join Multi-valued cells function
  6. Export to CSV
karenmajewicz commented 1 year ago

https://geo.btaa.org/catalog/02d-02