Swirrl / ons-data-export

Temporary repo to keep track of the extraction of data between the PMD3 backed alpha for the COGS project, and the PMD4 staging server.
0 stars 0 forks source link

Importing new (or updating existing) datasets in PMD4 requires ongoing manual data fixes #18

Closed jennet closed 4 years ago

jennet commented 4 years ago

At the moment, the (temporary) process of getting datasets into the PMD4 staging system is to:

  1. extract a dataset's data via a stardog query
  2. also extract any associated (via qb:codeList) reference data
  3. automatically construct some additional PMD4 catalog structural data
  4. Manually fix any data problems found

Data problems tend to be reintroduced whenever new data is extracted from PMD3

More details will be added on separate issues.

jennet commented 4 years ago

Issues #20 and #21 can be addressed by ONS in the configuration set up

Issues #22 and #24 are both most likely caused by stale data in PMD3 and Swirrl could potentially remove the stale data from PMD3 manually, so that it does not have to be repeatedly fixed after an import

The ongoing discussion around dimensions and code lists (which affects what changes may be required to configuration) will need to progress before we can address issues #23 and #15

jennet commented 4 years ago

Plan of action is to continue to log issues that can be addressed by configuration changes on this repo, and keep track here https://github.com/Swirrl/ons-data-export/projects/1