cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Restructure gtfs_schedule in views to use surrogate keys #288

Closed machow closed 2 years ago

machow commented 3 years ago

Currently, for tables like trips.txt we use calitp_itp_id, calitp_url_number, trip_id as a composite primary key.

This approach works okay for manual queries, but has three limitations:

Much of this is addressed by Kimball's dimensional modeling approach, where there are roughly two kinds of tables:

Critically the keys in dimensional tables are unique over time, so e.g. if a trip entry were removed by a feed and the re-added, it would have separate keys. This allows us to easily link back to info as it existed for a calculation that was made.

machow commented 3 years ago

Addressed by #241