Open btylerburton opened 3 months ago
how will the sync-to-ckan process know which data to use?
Do you think we'd be able to use harvest_job_id
to fetch all records posted during write_compare_to_db
? I believe so, but will confirm in a test.
ah yeah
a potential solution could be isolating the ckan-sync functionality as its own app ( e.g. datagov-harvest-ckan-sync
) where the task is similar to the harvester runner (i.e. python harvester/ckan-sync.py {job_id}
). this would operate independently of the runner. another could be having it just be another task within the datagov-harvest-runner
app.
User Story
In order to make the H2.0 application processes more discrete, datagovteam wants to separate the harvesting of records into the flask app from pushing the records to CKAN.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
[ ] GIVEN I want to harvest a valid harvest source WHEN the Flask admin app has harvested the records in full and without job level errors THEN the harvesting app will invoke the sync to CKAN
[ ] GIVEN the sync to CKAN occurs independently of the harvesting process THEN I should be able to run the sync N number of times without having to reharvest the records to Flask Admin.
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch