UDST / urbanaccess

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
https://udst.github.io/urbanaccess/index.html
GNU Affero General Public License v3.0
236 stars 56 forks source link

Two feeds not passed into network #84

Closed marcbosch-idencity closed 3 years ago

marcbosch-idencity commented 3 years ago

I am trying to create a network with all train lines in Spain, but two feeds are not passed into the network, i.e. they are properly downloaded, but when I try to create the network, they are not included. The two GTFS giving me problems are the following.

GTFS feeds

https://ssl.renfe.com/ftransit/Fichero_CER_FOMENTO/fomento_transit.zip

https://www.fgc.cat/google/google_transit.zip

Environment

The code I'm running is found here

https://github.com/marcbosch-idencity/urbanaccess-example/blob/main/longitudes_frecuencias_trenes.ipynb

Here is a specific script only for one of the GTFS feeds giving me problems.

https://github.com/marcbosch-idencity/urbanaccess-example/blob/main/red_Cercanias_feve.ipynb

When I run this line

ua.gtfs.network.create_transit_net(gtfsfeeds_dfs=loaded_feeds,
                                   day='tuesday',
                                   timerange=['00:00:00', '23:59:59'],
                                   calendar_dates_lookup=None)

I get the following error


WARNING: Time range passed: ['00:00:00', '23:59:59'] is a 23 hour period. Long periods over 3 hours may take a significant amount of time to process.
Using calendar to extract service_ids to select trips.
48 service_ids were extracted from calendar
14,057 trip(s) 14.47 percent of 97,135 total trip records were found in calendar for GTFS feed(s): ['cercanias']
NOTE: If you expected more trips to have been extracted and your GTFS feed(s) have a calendar_dates file, consider utilizing the calendar_dates_lookup parameter in order to add additional trips based on information inside of calendar_dates. This should only be done if you know the corresponding GTFS feed is using calendar_dates instead of calendar to specify service_ids. When in doubt do not use the calendar_dates_lookup parameter.
14,057 of 97,135 total trips were extracted representing calendar day: tuesday. Took 0.09 seconds
There are no departure time records missing from trips following the specified schedule. There are no records to interpolate.
Difference between stop times has been successfully calculated. Took 0.00 seconds

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-8-356b49904f86> in <module>
      2                                    day='tuesday',
      3                                    timerange=['00:00:00', '23:59:59'],
----> 4                                    calendar_dates_lookup=None)

C:\ProgramData\Anaconda3\envs\gds\lib\site-packages\urbanaccess\gtfs\network.py in create_transit_net(gtfsfeeds_dfs, day, timerange, calendar_dates_lookup, overwrite_existing_stop_times_int, use_existing_stop_times_int, save_processed_gtfs, save_dir, save_filename)
    156         df=gtfsfeeds_dfs.stop_times_int,
    157         starttime=timerange[0],
--> 158         endtime=timerange[1])
    159 
    160     final_edge_table = _format_transit_net_edge(

C:\ProgramData\Anaconda3\envs\gds\lib\site-packages\urbanaccess\gtfs\network.py in _time_selector(df, starttime, endtime)
    709         '.2f} seconds'.format(
    710             starttime, endtime, len(selected_stop_timesdf), len(df),
--> 711             (len(selected_stop_timesdf) / len(df)) * 100,
    712             time.time() - start_time))
    713 

ZeroDivisionError: division by zero

When running the notebook with all feeds, the script does not return any errors, it just does not include the stops from the two 'problematic' feeds into the network.

sablanchard commented 3 years ago

Hi @marcbosch-idencity ! I took a quick look at that specific feed you are using in red_Cercanias_feve.ipynb and I see that there are spaces in the trip_id values from the stop_times file while the trip_id in the trips file has no spaces. When UrbanAccess does its lookup and join on this column its technically seeing different values due to the spaces which results in no records found when the lookup is performed leading to the error you posted.

Ill put together a PR for the next release to add whitespace checks and corrections on columns used for GTFS file relations so that whitespace can be automatically removed in the specific columns.

In the meantime you can pre-process your GTFS files that have this discrepancy by doing this on the columns that have the issue for example for trip_id: df['trip_id'] = df['trip_id'].str.rstrip().str.lstrip()

marcbosch-idencity commented 3 years ago

That worked with the cercanías_Feve feed, so thank you very much!

sablanchard commented 3 years ago

PR to fix cases like this is here: https://github.com/UDST/urbanaccess/pull/85