Closed Daniellappv closed 4 years ago
ETA 6h
Some pseudocode to work as a silly tech spec (created in a meeting, so not very complex):
def transform(name=None, input_file=None):
datasets = []
if not input_file:
# loop over directory structure
if name:
# loop over <name> scraper output e..g nces
# datasets = list of all <name> files
else:
# loop over everything
# datasets = list of all JSON files
else:
# read file, which is a list of files
# datasets = list of all JSON files in the list
for dataset in datasets:
score = compute_score(dataset)
# now write all the scores and some ID data in a pandas df
# if S3 is configured, upload them to the bucket
def compute_score(json_obj):
json_obj_score = "something"
return json_obj_score
Tasks list updated
We have:
Description: Create a RAG summary of the results of the scraping exercise.
RAG to be determined:
Can we automate in some way? [Prob not in this sprint]
Acceptance criteria
Task-list
Link to Jira card: https://open-data-ed.atlassian.net/browse/OD-503