cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Epic (GTFS Schedule): Implement typecasting / type-checking and schema validity checks in views_staging #1151

Closed lauriemerrell closed 2 years ago

lauriemerrell commented 2 years ago

Inspired by #1148.

Right now, we don't check data types in a structured way in the GTFS schedule pipeline. In #1148 we were causing issues by setting data types in gtfs_schedule_history external table definitions that weren't actually checked or handled at all, so any violations caused a total pipeline failure.

The proper place to check and enforce data types is gtfs_views_staging where the _clean tables are created.

To do this, we should decide how to handle data type validations like what occurred in #1148 - as an interim there we just used SAFE_CAST so that the data made it into views with a null instead of the corrupt value.

lauriemerrell commented 2 years ago

looks like this created a dupe, closing