GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
546 stars 87 forks source link

Put Harvest Records to DB after Harvest Runner Compare #4733

Closed btylerburton closed 1 month ago

btylerburton commented 1 month ago

User Story

In order to create an accurate baseline for our catalog records, datagovteam wants to put all harvested records that should be created, updated, and deleted into the harvest_records DB table.

Depends upon:

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

Background

Our Harvest DB should be the source of truth.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!] None

Sketch

rshewitt commented 1 month ago

wanted to put this somewhere

reading harvest records

btylerburton commented 1 month ago

open question on whether we should record unchanged records

btylerburton commented 1 month ago

updated ticket with dependency on #4744

@GSA/data-gov-dev-team i'm dropping this at top of Harvester 2.0 backlog, but please do review AC/Sketch for completeness

rshewitt commented 1 month ago

status may need to be updated to include another value in the enum. something like "pending" because we write the compare results prior to writing them on ckan. a status of "success" doesn't really make sense if the record isn't on ckan yet right? and "error" doesn't make sense in the case where no error has occurred. my thinking is that status represents the sync status.

NVM it's a nullable field

jbrown-xentity commented 1 month ago

Correct. Status should be one of three things (I would think):

btylerburton commented 1 month ago

Should we just make status a nullable value? So we wouldn't post any status until we get a success or failure.