Closed btylerburton closed 1 year ago
Posting this here so that others can see where @Jin-Sun-tts got inspiration from to do some of the work for this ticket...
dataset view route | /dataset (similar to /api/action/package_search)
datasets = query_dataset_table()
json_view = [tojson(dataset) for dataset in datasets]
return json_view
harvest source create route | /harvest/create
def create_route(name, url):
if valid(name) + valid(url):
try:
source_id = generate_uuid()
result = create_dataset_record(source_id, name, url)
return json(source_id, result)
except:
# db error?
else:
# respond with whether name or url was invalid
harvest source view route | /harvest/source/
def harvest_view(source_id):
try:
return json(query_source_table(source_id))
except:
# not a valid source
Is harvest job creation different from running?
harvest job create route | /harvest/create/
import threading
class Job:
def __init__(self, ...):
self.name = ""
self.state = ""
def run(self):
try:
success, s3_paths = extract()
except:
# job failed
try:
working_datasets = compare(s3_paths, source_id)
except:
# job failed
threads = []
for wd in working_datasets:
wip = threading.Thread(target=process_dataset, args=(wd))
wip.start
threads.append(wip)
for thread in working_datasets:
thread.join()
# controller creates job summary
def process_dataset(self, dataset):
if validate(dataset):
new_dataset = tranform(dataset)
success = load(dataset)
harvest job run route | /harvest/run/
harvest job summary route | /harvest/status/
def job_summary(job_id=''):
if job_id:
summary = query_source_table(job_id)
else:
if valid(job_id):
try:
summary = query_source_table(all=True)
except:
# db error
else:
# respond with job id is invalid
return json(summary)
dcat-us extract | /extract/???
import harvester.extract as he
def extract(source_id, job_id, url):
if not valid(source_id):
# respond accordingly
if not valid(job_id):
# respond accordingly
success, s3_paths = he.main({"job_id": job_id, "source_id": source_id, "url": url})
if not succcess:
# update
dcat-us compare | /compare/???
dcat-us validate | /validate/???
dcat-us transform | /transform/???
dcat-us load | /load/???
interact with s3
insert/update/delete dataset from db
a job can have multiple states assocaited with it
no search functionality yet?
no database version control yet (alembic)
no frontend ui pages (only data view)
User Story
In order to begin work on the MVP for Harvesting 2.0, datagovteam would like to initialize a Flask application.
Related to:
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch