GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
547 stars 87 forks source link

Save CKAN ID to HarvestRecord #4791

Closed btylerburton closed 3 days ago

btylerburton commented 2 weeks ago

User Story

In order to keep track of packages once they are created in CKAN, datagovteam wants to store the ckan_id which is returned after package_create.

This will then be utilized on package_update and package_delete.

Acceptance Criteria

Background

To delete or update the dataset, the ckan package id is needed.

Security Considerations (required)

None

Sketch

btylerburton commented 2 weeks ago

We could use either below for key relationships:

I was leaning towards the former as it's 1:1, but if we're updating a dataset multiple times you'd get multiple entries in the lookup table which would then necessitate a date to sort them by.

Does it make better sense to use harvest_source.id:harvest_record.identifier?

  1. No lookups. We have both those values available at time we'd need to use them.
  2. Unless the identifier changed we have an entry pre-existing that we could just defer to after the create
  3. BUT ... If a data provider shuffles identifers around, we could end up with things in a disconnected state.
rshewitt commented 2 weeks ago

the harvest record table has a ckan_id field. we could include the ckan id in our sync success update. once we get the id from ckan after creation I would expect it wouldn't change even with an update? so we could add a ckan_id attribute to the record instance, assign the ckan id to the record instance on package create, then check if that value is not none in the update self in db method and add it to the update dict?

btylerburton commented 2 weeks ago

the harvest record table has a ckan_id field.

That it does! Ok. Now we'd just need code changes to:

Jin-Sun-tts commented 5 days ago

Retrieved the ckan_id from the CKAN package_create API call.

Tested it in development, deleted the all dataset in test org and clean the records in harvestDB. Re-harvest and the ckan_id is updated in the harvest_record table.