Closed machow closed 3 years ago
it's also more important to track validator results over time
so I think we might want something like
{each file} -> {check if same SHAHASH of zip compared to prior day run ->if new GTFS , save validator with run ID
Ah--that's helpful to hear! Maybe a place to start is a validator_latest
table, and then a gtfs_schedule_change
table with one row per agency x change in zip (that could later be unpacked into more details).
(I can run checks on data changes in notebooks first to get a feel for how often it's changing, etc..)
Alright, maybe there are two stages to do this in. Will edit in a bit more detail, but wanted to put here for now.
Questions answered:
Technical:
Questions answered:
Technical:
@machow let's close this?
Right now the gtfs-validator returns a JSON payload. This notebook shows how it can be turned into a table, with 1 row per notice (see
tidy_notice_details
table), and either..A big question though is how BigQuery likes to deal with these kinds of tables. AFAIK there are three potential options:
I'm guessing one of the first two makes most sense.