cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Create a GTFS RT daily validation notices table #357

Closed machow closed 2 years ago

machow commented 3 years ago

In a gtfs_rt_transform DAG, create a KubernetesPodOperator task that does the following:

To setup for validation:

Running and extracting validation results:

Loading results:

machow commented 2 years ago

Current DDL statement for loading results for vehicle positions...

CREATE OR REPLACE TABLE gtfs_rt.validation_report (
    errorMessage STRUCT<
      messageId INT64,
      gtfsRTFeedIterationModel STRING,
      validationRule STRUCT<
        errorId STRING,
        severity STRING,
        title STRING,
        errorDescription STRING,
        occurrenceSuffix STRING
      >,
      errorDetails STRING
    >,
    occurenceList ARRAY<
      STRUCT<
        occurenceId INT64,
        messageLogModel STRING,
        prefix STRING
      >
    >
)
OPTIONS (
)

TODO:

machow commented 2 years ago

proof of concept merged here: https://github.com/cal-itp/data-infra/pull/757

would need to compress results to complete. Note that the bulk of the size comes from rule explanations being embedded in the results: https://github.com/cal-itp/data-infra/issues/786#issuecomment-983945175

requires: #786