GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
661 stars 103 forks source link

Decouple Harvesting to Flask Admin App from pushing to CKAN #4865

Open btylerburton opened 3 months ago

btylerburton commented 3 months ago

User Story

In order to make the H2.0 application processes more discrete, datagovteam wants to separate the harvesting of records into the flask app from pushing the records to CKAN.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

rshewitt commented 3 months ago

how will the sync-to-ckan process know which data to use?

btylerburton commented 3 months ago

Do you think we'd be able to use harvest_job_id to fetch all records posted during write_compare_to_db? I believe so, but will confirm in a test.

rshewitt commented 3 months ago

ah yeah

rshewitt commented 1 month ago

a potential solution could be isolating the ckan-sync functionality as its own app ( e.g. datagov-harvest-ckan-sync ) where the task is similar to the harvester runner (i.e. python harvester/ckan-sync.py {job_id} ). this would operate independently of the runner. another could be having it just be another task within the datagov-harvest-runner app.