cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Easier way to analyze GTFS-RT completeness #2524

Open evansiroky opened 1 year ago

evansiroky commented 1 year ago

User story / feature request

As a transit data compliance specialist, I would like to have the ability to have a summary of which RT feeds are not "complete", so that I can coordinate with transit agencies to try to help them make their realtime data complete

Acceptance Criteria

It would be really cool if there was some kind of summary table where analysts can see what the ratio of observed vs schedule trips were for a given time period/route/etc. I'm also assuming that there is a large cost to analyzing a lot of our RT data at once.

Summary tables should exist for both vehicle positions and trip updates.

Notes

I have already made 2 metabase questions that sort of seek to accomplish this:

Both of them combine together the fct_daily_scheduled_trips and fct_observed_trips to analyze the relationship between the two. I don't know how much data these process, but it'd be cool to be able to very quickly have a table of what percent of trips are being accounted for with RT data for all transit agencies on a daily basis.

evansiroky commented 1 year ago

Latest Question that links to a per-route dashboard for an individual agency. At this point, I'm just wondering if this is too much data consumption or if it can be simplified.