cwrc / RDF-extraction

0 stars 0 forks source link

Data extraction: Orlando items within CWRC #16

Open jefferya opened 7 years ago

jefferya commented 7 years ago

Test Orlando data within the CWRC repository:

How to access via the Islandora Rest API - https://github.com/discoverygarden/islandora_rest/blob/7.x/README.md

The following is a bit of pseudocode describing how to work with Orlando Islandora objects via the Islandora REST API

Definitions:

Pseudocode

given a {PID}

// lookup properties of the object via the REST endpoint
{SERVER_NAME}/islandora/rest/v1/object/{PID}
parse JSON response and save the "models" property

// lookup the appropriate datastream for the extractor
* if "model" = "info:fedora/cwrc:citationCModel" // bibl
  * then {DSID} = "MODS"
* if "model" = "info:fedora/cwrc:documentCModel" // entry or event
  * then {DSID} = "CWRC"
* if "model" = "info:fedora/cwrc:organization-entityCModel"
  * then {DSID} = "ORGANIZATION"
* if "model" = "info:fedora/cwrc:person-entityCModel"
  * then {DSID} = "PERSON"

// lookup content via the REST endpoint
* {SERVER_NAME}/islandora/rest/v1/object/{PID}/datastream/{DSID}/?content=true

Example REST calls:

List of PIDs

joelacummings commented 7 years ago

Notes:

Will use UUID for rdf:links where available to link back to Orlando. Bibliography is low hanging fruit and then move on from there. Use existing tools to convert to RDF if possible.

joelacummings commented 7 years ago

I would like to propose introducing using skos:editorialNote, skos:historyNote, and skos:changeNote to migrate the recordInfo seen within the MODS extraction. It seems worth tracking how records change and I believe using these concepts would make for a reasonable translation. This would need to be introduced into the CWRC ontology first.

 <recordInfo>
      <recordContentSource>Orlando, Cambridge University Press</recordContentSource>
      <recordCreationDate encoding="w3cdtf">2016-12-21</recordCreationDate>
      <recordChangeDate encoding="w3cdtf">2016-12-21</recordChangeDate>
      <recordIdentifier source="The Orlando Project">laurma-b.xml</recordIdentifier>
      <recordOrigin>MODS record has been created from an SGML record using an XSLT stylesheet.</recordOrigin>
      <languageOfCataloging>
         <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
         <languageTerm type="text">English</languageTerm>
      </languageOfCataloging>
   </recordInfo>
jefferya commented 6 years ago

@SusanBrown @ilovan @joelacummings Updated original with new information as per yesterday's meeting.