Open Abby-Wheelis opened 2 months ago
FYI, I think that the uprm-civic also has replaced mode (scootershare
)
Hi everyone! You currently use overpass-api.de but @Abby-Wheelis you said you need routing. Please find this transitland route in Denver CO https://www.transit.land/routes/r-9xj3-h Is this what we need?
"public transport routing ... requires timetable data to work properly, and OSM doesn't have that." https://www.reddit.com/r/openstreetmap/comments/v914h0/comment/ibttco1/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
EDIT: I found that OSM does this for example
[out:json];
area[name="Gainesville"]->.searchArea;
(
relation["route"="bus"](area.searchArea);
);
out body;
>;
out skel qt;
Some places provide realtime data such as boston MBTA https://www.mbta.com/developers/gtfs-realtime Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?
https://github.com/MobilityData/awesome-transit?tab=readme-ov-file https://mobilitydatabase.org/feeds/mdb-1602 https://mobilitydata.github.io/mobility-feed-api/SwaggerUI/index.html https://docs.opentripplanner.org/en/latest/
@jpfleischer I meant we need routing in the sense of:
I went from my house to the drugstore by car. I want to be able to run a query (ideally via API) that will give me the time and cost of the alternatives (e.g. the equivalent of this
but with cost included
OSM has transit data, and we use transit data from it using overpass for mode detection (look at `emission/net/ext_services ) but it doesn't do routing. OSM-based routing services such as OSRM or GraphHopper typically do not support transit. So we cannot use them to find transit alternatives.
There is an open-source routing engine that takes transit into account (Open Trip Planner) https://opentransitsoftwarefoundation.org/
We are friends with the OTP folks and have tried using their software before. But for us to use this in a production system, somebody still needs to run the software, load the data, keep it updated, etc. Ideally, there would be an overpass-like system that we could use for routing and that we could pay for if needed. But I am not sure that google maps alternative exists.
Can we find a free opensource resource that gives all available GTFS Realtime data sources worldwide?
transit.land is intended to do that, at least for the US. But somebody needs to load that data
One final comment on this: wrt the framing of this problem, we have discussed how there are people's preferences (which are related to the person) and the alternatives (which are related to the environment)
So the same person may make different choices in a different environment (e.g. @jpfleischer taking transit in Boston but not in FL) even though their internalized preferences have not changed.
Just wanted to highlight the flip side of that, which is that different people can have different preferences. While @jpfleischer would not ever take the bus in FL, there are clearly people who do (otherwise, the bus system would have shut down).
For the replaced mode project, we want to understand individual or group preferences, specifically as a set of factors that influence their (assumed rational) choices. We can then apply those preferences to a different set of alternatives (new transit line, no e-bike available, parking restrictions...) and get a sense of how they will behave, and by extension, what the impact of the modification to the alternatives is.
@jpfleischer Here is a PR related to the NTD data processing and integration for energy and emissions, maybe similar methods would allow us to extract transit cost? e-mission-common PR
I think the notebooks in metrics/footprint/.archive
could be a good place to start
@Abby-Wheelis Average fare collected per passenger is a column here https://data.transportation.gov/Public-Transit/2022-NTD-Annual-Data-Metrics/ekg5-frzt/explore
For frequency - NTD glossary defines "Headway" as "The time interval between vehicles moving in the same direction on a particular route. Can be found in: S-10" - now if I can just figure out where S-10 is...
S-10 is a form that agencies fill out for reporting to NTD: the 2023 version here includes many of the fields that we saw in the data table with time periods and when they are active (AM peak, Sunday, etc) but I don't see "Headway" in the form or the data table, unfortunately
I have not been able to find service frequency or headway, but I did find a paper (from 2011) referring to methods for evaluating performance using NTD data System for Transit Performance Analysis Using the National Transit Database, notably:
Average Headway (in minutes). This is an important measure of service frequency. It is computed by first dividing the total directional route mileage from Form S-10 by the system’s calculated average speed, as defined above, to obtain an estimate of the number of hours it takes to traverse the entire system’s total route miles. This time (in hours) is then divided by the system’s average weekday total vehicles from Form S-10 to determine the amount of time in hours it takes for a vehicle to complete its portion of the total route miles, one time. The resulting time is then multiplied by 60 for conversion from hours to minutes.
It is true that GTFS agencies publish their stop times a lot more frequently than they publish their fares. However, as @Abby-Wheelis has found, there is a documented way to discover headway within a paper, and it will be more straightforward to apply such logic (after verifying its accuracy and reasoning).
It would be quite complicated to get the stop times also because there is no NTD ID in the GTFS data, only the stop coordinates, so we would have to add logic to convert coordinates to UACE.
We may consider comparing both options if time allows, but for now, just do NTD headway calculation.
A few new notes from our meeting today:
fare: {"am_peak":15, "sunday":90 ...}
for each mode/agencyGiven that the pseudoformula I found in the paper is fairly complicated I just wanted to think through it to sanity check to make sure we agree with it before trying to implement it:
Average Speed. This is the average speed of vehicles in revenue service operation (i.e., not including travel to and from the garage or any other deadhead) and is calculated by dividing the total actual vehicle (for non-rail modes) or train (for rail modes) revenue miles by the total actual vehicle/train revenue hours. Both of the variables come from Form S-10. 4 Average Headway (in minutes). This is an important measure of service frequency. It is computed by first dividing the total directional route mileage from Form S-10 by the system’s calculated average speed, as defined above, to obtain an estimate of the number of hours it takes to traverse the entire system’s total route miles. This time (in hours) is then divided by the system’s average weekday total vehicles from Form S-10 to determine the amount of time in hours it takes for a vehicle to complete its portion of the total route miles, one time. The resulting time is then multiplied by 60 for conversion from hours to minutes.
speed = revenue miles / revenue hours
headway = (directional mileage / speed) / num vehicles = time taken to cover entire system / num vehicles = how long it would take one vehicle to complete a lap? so how often it arrives?
I'm not sure I've gotten my head wrapped around this formula, if anyone sees it differently please feel free to let me know how we can interpret it
Abby is right, we have now used the preexisting ntd script and leveraged its logic to add on fares, while fixing a bug to get it to work. We are considering, in regards to coordinate-to-fare return function-
I think half of the preexisting function can be generalized, but for now, I will go with the first option.
We also anticipate that, since the fare information is only attached to UACE, that we will calculate a general average fare across the entire UACE, weighted by number of passenger trips, to return fare information for a particular coordinate.
@Abby-Wheelis @jpfleischer have you started working with the OTP yet? We definitely need travel time as well. I wonder if the OTP API supports any generic queries related to GTFS and headways, similar to https://nycplanning.github.io/td-travelshed/mapbox/public/
Short-term goals:
We now have a mechanism to launch OpenTripPlanner within a docker container and to build an instance for Denver's RTD.
The shortcoming is that it is required to manually specify the gtfs source, but since I know how to pull the GTFS links from Mobility Database according to State, then we can combine the two projects to have an automated GTFS fetcher and an automated transit time calculator (fetched from the OTP API on our local docker instance).
Next to do is to make the OTP API logic to get transit times according to coordinates.
An issue is that currently, the transit times are not able to be calculated for trips from more than a several months prior. A potential solution is using Mobility Database to pull historical GTFS.
GTFS is bad for fare, great for stop times. OTP uses GTFS feeds as input-- the developer has to specify these GTFS feeds and provide them. OTP is great to serve as a local calculator and provider of transit times. No reliance on external API or website needed. OTP even appears to use OneBusAway in its logic.
Mobility Database only has records for RTD Denver dating back to Feb 2024: https://mobilitydatabase.org/feeds/mdb-178
However, I downloaded the gtfs using the Wayback Machine and then successfully calculated trips for 2022. The URL was the same in 2022 as it was now, leading me to successfully add that to my OTP instance.
...however, it is much more straightforward to use the older version of Mobility Database, called OpenMobilityData, https://transitfeeds.com/p/rtd-denver/188?p=26 to get the old GTFS files.
With ~60 GTFS zip files, OTP takes many hours to fully start up. The same is likely similar for r5, as its Python wrapper is shown here taking a while to initialize.
Possible workarounds are combining all the GTFS files into one using a merge tool such as gtfsmerge, or instead of using 1 gtfs file per month per year, use maybe one every three months.
The most centralized status of this objective is located at the README.md at my e-mission-common fork: https://github.com/jpfleischer/e-mission-common/blob/03b789d344ab9d55ec1d6b6bd668262e33003401/scripts/otp/README.md?plain=1#L1-L14
The most crucial aspect is using historical GTFS data. Right now that data lives on AWS servers belonging to OpenMobilityData. The website has a banner on its front page declaring that it is deprecated. I am hoping that this data does not disappear because it is quite crucial.
It would be ideal, upon returning to this objective, to save a json of all the agencies, as in:
rtd-denver: https://transitfeeds.com/p/rtd-denver/188
miami-dade-county-transit: https://transitfeeds.com/p/miami-dade-county-transit/48
# .... and so on .....
we scrape because OpenMobilityData no longer gives out API keys.
The URL value in the key-value pairing (taken from above) is needed, and the existing logic in https://github.com/jpfleischer/e-mission-common/blob/master/scripts/otp/scrape.ipynb takes care of the rest.
We could ideally get more agencies for Colorado but as of now, we only use RTD Denver.
There are two main components to predicting mode choice with a choice model:
1) The choice model itself, representing people's preferences ie Abby's preferences for travel are: (time: -2, cost: -5, fun: 10) 2) The possible modes for the trip ie [Car: (cost=15$, time=10min, fun=0), E-bike:(cost=1$, time=15min, fun=1), walk:(cost=0$, time=60min, fun=-1)]
With these factors we can predict that Abby would choose e-bike (-25) and without the e-bike would choose car(-95) but wouldn't choose walk (-130), approximately.
As a baseline, we want to build a logistic regression model, since that is what is most commonly used in research and planning to model mode choice (ie what would the ridership returns on this transit investment be like?).
We have ground truth data about 2nd choice modes, through the replaced mode collected by programs that have a mode of interest, often
e-bike
. This is used to show the impact of the mode of interest, through things like emissions savings/reductions which we map on the public dashboard.To build up the alternatives, we'll need a few different pieces of data, which could be complex to figure out:
@shankari @jpfleischer for visibility