CivicActions / edscrapers

US Department of Education Data Scraping Kit; see https://us-ed-scraping.ckan.io/dataset
GNU Affero General Public License v3.0
15 stars 9 forks source link

data.json Schema changes #114

Open nightsh opened 4 years ago

nightsh commented 4 years ago

Currently we are using a validation schema for the data harvested into the portal: https://project-open-data.cio.gov/v1.1/schema/#accessLevel

However, this schema does not support a number of metadata properties we need to have, such as:

At data.json level, we can make adjustments to this behaviour so it would allow the needed properties. Since the files are generated by our own datajson transformers as part of CivicActions/edscrapers process flow, we can easily change the final transformation steps to reflect our needs.

Analysis

Two options for this:

1. Remove schema validation

This would have the flexibility benefit: anything we might need to add the the structure of the data.json file would just work without touching other parts of the flow.

The caveat is, of course, that we would miss validation, thus increasing the risk of introducing bad data and trusting the datajson transformer to make the final calls.

2. Fork the schema to add the missing features

Best of both worlds: continue having validation, but bend the rules so we can accomodate the properties we want, the way we want them.

We will have to copy the source schema and host the modified copy, then use it as part of the generated datajson files.

Recommendation:

use option 2 i.e. an altered version of the schema, altering its structure to match our data specs.

Based on this recommendation, specs for this are provided here

Tasks:

Acceptance criteria: