Closed e-lo closed 2 years ago
@hunterowens and @Nkdiaz to pair in Sep. Key questions:
to do this, we'll need a final dataset that looks something like this
census_tract_geoid | has_demand_responsive | has_fixed_route | population | jobs | area |
---|---|---|---|---|---|
xxx | True | False | 10 | 1 | 10M2 |
this will require 3ish datasets.
@hunterowens to link census datasets here for @Nkdiaz
data - you'll want 3 year ACS tract level population estimates.
@e-lo linked to the LODES employment statistics by tract in the issue description
For data team meeting (from pairing w/ @Nkdiaz):
Intermediate next step:
Final piece:
has_demand_responsiveness
, has_fixed_route
From pairing w/ @Nkdiaz, I just remembered that R has a very nice census library called tidycensus
(as I recall, python's census lib is a bit more low level). Here is an R script for pulling out the data in the format @hunterowens listed above.
May need to tweak the variables we grab (I just grabbed a couple to fill in)
library(tidycensus)
library(tidyverse)
census_api_key("<API_KEY_HERE>")
# uncomment this to see possible variables to load
load_variables(2019, "acs5", cache = TRUE) %>% View()
# get ACS data. modify variables argument to get additional measures
# we'll transform them to wide format so each variable will be its own column
tracts <- get_acs(
geography="tract",
state = "CA",
year = 2019,
variables = c(ttl_population = "B01003_001", ttl_jobs = "B24011_001")
)
wide_tracts <- tracts %>% select(-moe) %>% spread(variable, estimate)
wide_tracts %>% write_csv("calitp_acs5_tracts.csv")
In order to calculate has_demand_responsive
that Hunter lists above. We need to do the following...
Modes
column to figure out if they have "DR" (for now, just check that DR is one of the modes listed)City
and County
columns to map back to census data.@e-lo for this piece...
Total ridership of demand-responsive transit services
Where do total ridership numbers come from? Is it somewhere in transitstacks?
edit: nvm we found it in the NTD Stats has UPT_DR which @Nkdiaz is planning to use. Definitely let us know if there's a better source
Blocked until the data source for demand responsive provider info is provided. See this slack thread.
Here is the notebook Natalya worked on for this: https://github.com/cal-itp/data-analyses/blob/main/gtfs_flex_research_questions/calitp_flex_research_demand_responsive.ipynb
The draft of this analysis is completed--let's wait until other needs to re-open
see this related issue: cal-itp/data-analyses#171
@Nkdiaz: With the notebook, I see a couple of datasets that could be documented in intake
:
The demand responsive data is fairly analysis-specific....but it comes from a Google Sheet and not our warehouse? If It's not elsewhere in the warehouse and this is a canonical source, then maybe it should either live in the warehouse or a GCS bucket and documented in intake
Journey-planning applications have asked for some data points to help prioritize GTFS-Flex capabilities. Some data points which would help demonstrate the importance of demand-responsive transit:
Priority
Med-high: This will feed into [unnamed journey planning application]'s prioritization process. There isn't a specific deadline so much as "as soon as we provide the data, they can start the process that will then start a process.
Data Sources