cal-itp / data-analyses

Place for sharing quick reports, and works in progress
https://analysis.calitp.org
27 stars 6 forks source link

MSD Dashboard Metric: general population public transit gtfs coverage #169

Closed evansiroky closed 2 years ago

evansiroky commented 3 years ago

Question

What % of California (and Californians) has (open to the public) transit coverage in GTFS?

Metrics

By area:

By Population:

By Employment (optional):

Data sources

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team. This allows us to describe the request, in a way that is easy to hand-off for analysis. After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

After reviewing research with the asker:

evansiroky commented 2 years ago

@edasmalchi to make a first pass at creating a notebook in the https://github.com/cal-itp/data-analyses/ repo to attempt to answer this question.

edasmalchi commented 2 years ago

May be blocked on these since I either can't find the right Census data at geometries smaller than tract or they do not provide it.

The relevant table seems to be B18105 from the 2019 ACS 5-year, but it doesn’t seem like I can get a geometry smaller than tract from either data.census.gov or using their API

Rephrasing the metric along the lines of "the % of able-bodied Californians in a census tract containing a bus stop or a ferry/rail stop that has a provider with both static and realtime GTFS data" would work if tract-level data really is the best we can get. Also still pending realtime.

machow commented 2 years ago

AFAICT there are 3 ways to tell if a provider has GTFS realtime data, and wonder which would work here (ordered from least to most effort)?:

  1. a provider is listed in the airtable as having GTFS RT
  2. We have downloaded their GTFS RT data (e.g. within a certain time period)
  3. We have downloaded their GTFS RT data and confirmed it contains data for that stop's trips

I wonder if at the very least (1) would get us off the ground on reporting out this metric (on the RT side of things)..!

edasmalchi commented 2 years ago

AFAICT there are 3 ways to tell if a provider has GTFS realtime data, and wonder which would work here (ordered from least to most effort)?:

(1) Seems like enough for a first pass here. I haven't actually used airtable before -- is there a guide someplace or a way to get access?

evansiroky commented 2 years ago

After taking a closer look at the data schema, I don't think we're blocked on realtime after all. We should just be able to fetch from gtfs_schedule_history.calitp_feeds and check if all 3 of the fields gtfs_rt_vehicle_positions_url, gtfs_rt_service_alerts_url and gtfs_rt_trip_updates_url are not null to determine if a feed has realtime.