Open dcjohnson24 opened 2 years ago
on trips crossing the hour boundary - are we suspecting that this code is double-counting trips if the trip crosses an hour boundary? despite vid being aggregated as a set?
I think that's the code. I guess maybe vid
is unique only for a given hour, but it could appear in another hour for the same trip. It does seem strange though.
Investigate routes with
ratio > 1
There are some routes that have a ratio of actual trips to scheduled trips greater than one, and it would be good to know why.
Access the data
Jupyter Notebook
To access the data, run the notebook
compare_scheduled_and_rt.ipynb
. Add a cell at the bottom with%store summary
and run it. The%store
magic command allows you to share variables between notebooks https://stackoverflow.com/questions/31621414/share-data-between-ipython-notebooks.Next, run the
static_gtfs_analysis.ipynb
. Add a cell at the bottom with%store -r summary
and run it to read thesummary
DataFrame from thecompare_scheduled_and_rt.ipynb
notebook. Merge thesummary
DataFrame with thefinal_gdf
GeoDataFrame from thecompare_scheduled_and_rt.ipynb
usingsummary_gdf = summary.merge(final_gdf, how="right", on="route_id")
Python
Run the following in an interpreter from the project root:
Find routes with
ratio > 1
To filter the rows with
ratio > 1
, useA few things to look for:
ratio > 1
after reaggregting data based on a different frequency e.g. daily, see #12