NRGI / resource-projects-etl

ETL processes for rp.org
GNU General Public License v2.0
3 stars 2 forks source link

Specify workflow for updates #26

Open timgdavies opened 9 years ago

timgdavies commented 9 years ago

We need to describe a clear workflow for updates to data.

This ticket as a placeholder.

timgdavies commented 9 years ago

Handling manual edits

It would be useful to allow minor edits to be made through a web interface. Ontowiki supports this, but we need to capture the history of edits, so they can be re-applied if data is re-imported in future.

Using the history

When edits are made through Ontowiki a history is maintained via the erfurt API in the Versioning Actions and Versioning Payload tables.

The query:


SELECT * FROM DB.DBA.ef_versioning_actions LEFT JOIN DB.DBA.ef_versioning_payloads ON DB.DBA.ef_versioning_payloads.id = payload_id WHERE model = 'http://resourceprojects.org/' 

fetches all the changes stored in history with hashes. For example, the following hashes represent removing the labels 'Uk' and 'UK' from 'http://resourceprojects.org/country/gb':

 a:1:{s:38:"http://resourceprojects.org/country/gb";a:1:{s:45:"http://www.w3.org/2004/02/skos/core#prefLabel";a:1:{i:0;a:3:{s:5:"value";s:2:"Uk";s:4:"type";s:7:"literal";s:4:"lang";s:2:"en";}}}}
 a:1:{s:38:"http://resourceprojects.org/country/gb";a:1:{s:45:"http://www.w3.org/2004/02/skos/core#prefLabel";a:1:{i:0;a:3:{s:5:"value";s:2:"UK";s:4:"type";s:7:"literal";s:4:"lang";s:2:"en";}}}}

Need to identify the meaning of s:38 etc. to work out how we know this is a command to remove triples.