cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Research Question: Importance of GTFS-Flex #228

Closed e-lo closed 2 years ago

e-lo commented 3 years ago

Journey-planning applications have asked for some data points to help prioritize GTFS-Flex capabilities. Some data points which would help demonstrate the importance of demand-responsive transit:

  1. Number of transit services which have demand-responsive transit
  2. % of California (by area, pop, employment) not served by fixed-route transit service
  3. % of California served by demand-responsive transit
  4. Total ridership of demand-responsive transit services

Priority

Med-high: This will feed into [unnamed journey planning application]'s prioritization process. There isn't a specific deadline so much as "as soon as we provide the data, they can start the process that will then start a process.

Data Sources

hunterowens commented 3 years ago

previous report and research

machow commented 3 years ago

@hunterowens and @Nkdiaz to pair in Sep. Key questions:

hunterowens commented 3 years ago

to do this, we'll need a final dataset that looks something like this

census_tract_geoid has_demand_responsive has_fixed_route population jobs area
xxx True False 10 1 10M2

this will require 3ish datasets.

  1. Census Tract Population / Jobs Numbers for all of CA
  2. A list of tract with fixed route service - join the stops dataset to tract geometeries, assume all that have a stop have fixed route service
  3. A list of tracts with DR service. TAke the City / County fields from the Transitstacks / NTD HQ data, assume the DR provider services the entirity of that geog. join to the state geoportal dataset, simplify into a single geometry, take the centroid of the tract, point in polygon insersect that to the tracts dataset to get "has_DR"
hunterowens commented 3 years ago

@hunterowens to link census datasets here for @Nkdiaz

hunterowens commented 3 years ago

data - you'll want 3 year ACS tract level population estimates.

@e-lo linked to the LODES employment statistics by tract in the issue description

machow commented 3 years ago

For data team meeting (from pairing w/ @Nkdiaz):

Intermediate next step:

Final piece:

machow commented 3 years ago

From pairing w/ @Nkdiaz, I just remembered that R has a very nice census library called tidycensus (as I recall, python's census lib is a bit more low level). Here is an R script for pulling out the data in the format @hunterowens listed above.

May need to tweak the variables we grab (I just grabbed a couple to fill in)

library(tidycensus)
library(tidyverse)

census_api_key("<API_KEY_HERE>")

# uncomment this to see possible variables to load
load_variables(2019, "acs5", cache = TRUE) %>% View()

# get ACS data. modify variables argument to get additional measures
# we'll transform them to wide format so each variable will be its own column
tracts <- get_acs(
  geography="tract",
  state = "CA",
  year = 2019,
  variables = c(ttl_population = "B01003_001", ttl_jobs = "B24011_001")
)

wide_tracts <- tracts %>% select(-moe) %>% spread(variable, estimate)

wide_tracts %>% write_csv("calitp_acs5_tracts.csv")
machow commented 3 years ago

In order to calculate has_demand_responsive that Hunter lists above. We need to do the following...

machow commented 3 years ago

@e-lo for this piece...

Total ridership of demand-responsive transit services

Where do total ridership numbers come from? Is it somewhere in transitstacks?

edit: nvm we found it in the NTD Stats has UPT_DR which @Nkdiaz is planning to use. Definitely let us know if there's a better source

machow commented 3 years ago

Blocked until the data source for demand responsive provider info is provided. See this slack thread.

machow commented 2 years ago

Here is the notebook Natalya worked on for this: https://github.com/cal-itp/data-analyses/blob/main/gtfs_flex_research_questions/calitp_flex_research_demand_responsive.ipynb

machow commented 2 years ago

The draft of this analysis is completed--let's wait until other needs to re-open

see this related issue: cal-itp/data-analyses#171

tiffanychu90 commented 2 years ago

@Nkdiaz: With the notebook, I see a couple of datasets that could be documented in intake:

The demand responsive data is fairly analysis-specific....but it comes from a Google Sheet and not our warehouse? If It's not elsewhere in the warehouse and this is a canonical source, then maybe it should either live in the warehouse or a GCS bucket and documented in intake