Closed btylerburton closed 3 weeks ago
With https://github.com/GSA/datagov-harvesting-logic/pull/23 and https://github.com/GSA/datagov-harvester/pull/6, this should be done.
For evidence:
A log string from the extract()
function in datagov-harvesting-logic
being printed by a DAG running in a local dockerized instance of datagov-harvester
.
Also, with https://github.com/GSA/datagov-harvesting-logic/pull/25, it looks like a test run successfully publishes as https://pypi.org/project/datagov-harvesting-logic/0.0.3.post1/.
User Story
In order to fully test integration of the datagov-harvesting-logic library, and to begin benchmarking tests against real cloud infrastructure, the datagovteam would like to publish the datagov-harvesting-logic module to PyPi.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
[ ] GIVEN I refactor the datagov-harvesting-logic code to follow our ETL pipeline steps THEN I will see an interface that mirrors the harvesting lifecycle:
extract
,compare
,transform
,validate
,load
[ ] GIVEN that some of these operations will be unfinished at this time EVEN if the interface is linked to a no-op/naive echo process. THEN I expect the methods to be available for import
[ ] GIVEN I have have refactored this code THEN I expect there to be passing tests
[ ] GIVEN I have have written an interface method to a non-existent process THEN I STILL expect there to be passing tests
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
extract
,compare
,transform
,validate
,load
import extract, compare, transform, validate, load from dcat_us
, even if some of these are just no-op shells for the time being.dcat_us
only for the time being