CivicActions / edscrapers

US Department of Education Data Scraping Kit; see https://us-ed-scraping.ckan.io/dataset
GNU Affero General Public License v3.0
15 stars 9 forks source link

Create a RAG summary of the results of the scraping exercise #11

Closed Daniellappv closed 4 years ago

Daniellappv commented 4 years ago

Description: Create a RAG summary of the results of the scraping exercise.

RAG to be determined:

Can we automate in some way? [Prob not in this sprint]

Acceptance criteria

Task-list

Link to Jira card: https://open-data-ed.atlassian.net/browse/OD-503

nightsh commented 4 years ago

ETA 6h

nightsh commented 4 years ago

Some pseudocode to work as a silly tech spec (created in a meeting, so not very complex):

def transform(name=None, input_file=None):

    datasets = []

    if not input_file:
        # loop over directory structure
        if name:
            # loop over <name> scraper output e..g nces
            # datasets = list of all <name> files
        else:
            # loop over everything
            # datasets = list of all JSON files
    else:
        # read file, which is a list of files
        # datasets = list of all JSON files in the list

    for dataset in datasets:
        score = compute_score(dataset)
        # now write all the scores and some ID data in a pandas df
        # if S3 is configured, upload them to the bucket

def compute_score(json_obj):
    json_obj_score = "something"
    return json_obj_score
osahon-okungbowa commented 4 years ago

Tasks list updated

nightsh commented 4 years ago

We have: