Decouple Harvesting to Flask Admin App from pushing to CKAN

btylerburton commented 3 months ago

User Story

In order to make the H2.0 application processes more discrete, datagovteam wants to separate the harvesting of records into the flask app from pushing the records to CKAN.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

[ ] GIVEN I want to harvest a valid harvest source WHEN the Flask admin app has harvested the records in full and without job level errors THEN the harvesting app will invoke the sync to CKAN
[ ] GIVEN the sync to CKAN occurs independently of the harvesting process THEN I should be able to run the sync N number of times without having to reharvest the records to Flask Admin.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

[ ] Decouple harvesting of records to Flask Admin from the Sync to CKAN
[ ] Create triggers that can invoke a CKAN sync independent of a harvest invocation

rshewitt commented 3 months ago

how will the sync-to-ckan process know which data to use?

btylerburton commented 3 months ago

Do you think we'd be able to use harvest_job_id to fetch all records posted during write_compare_to_db? I believe so, but will confirm in a test.

rshewitt commented 3 months ago

ah yeah

rshewitt commented 1 month ago

a potential solution could be isolating the ckan-sync functionality as its own app ( e.g. datagov-harvest-ckan-sync ) where the task is similar to the harvester runner (i.e. python harvester/ckan-sync.py {job_id} ). this would operate independently of the runner. another could be having it just be another task within the datagov-harvest-runner app.

GSA / data.gov