Closed machow closed 2 years ago
Current DDL statement for loading results for vehicle positions...
CREATE OR REPLACE TABLE gtfs_rt.validation_report (
errorMessage STRUCT<
messageId INT64,
gtfsRTFeedIterationModel STRING,
validationRule STRUCT<
errorId STRING,
severity STRING,
title STRING,
errorDescription STRING,
occurrenceSuffix STRING
>,
errorDetails STRING
>,
occurenceList ARRAY<
STRUCT<
occurenceId INT64,
messageLogModel STRING,
prefix STRING
>
>
)
OPTIONS (
)
TODO:
proof of concept merged here: https://github.com/cal-itp/data-infra/pull/757
would need to compress results to complete. Note that the bulk of the size comes from rule explanations being embedded in the results: https://github.com/cal-itp/data-infra/issues/786#issuecomment-983945175
requires: #786
In a
gtfs_rt_transform
DAG, create a KubernetesPodOperator task that does the following:To setup for validation:
{tmp_dir}/rt
{filename}_{date}
, so we can tell the validator to sort by file nameRunning and extracting validation results:
rt-processed/validation/{date}_{itp id}_{url number}.json
Loading results:
rt-processed