Bondify / gtfs_functions

Package with useful functions to create geo-spatial visualizations from a GTFS.
MIT License
114 stars 30 forks source link

Pandas ValueError on computing line frequencies #48

Open fxjung opened 6 months ago

fxjung commented 6 months ago

I've tried to follow the README example on computing line frequencies using a large GTFS feed (entire Germany), retrieved from here:

time_windows = [0, 6, 9, 15.5, 19, 22, 24]

feed = Feed(
    str(gtfs_path),
    time_windows=time_windows,
    start_date="2024-02-22",
    end_date="2024-02-23",
)
line_freq = feed.lines_freq
line_freq.head()

Unfortunately, this fails with the following error/trace:

INFO:root:Reading "stop_times.txt".
INFO:root:get trips in stop_times
INFO:root:accessing trips
INFO:root:Reading "routes.txt".
INFO:root:Reading "trips.txt".
INFO:root:Reading "calendar.txt".
INFO:root:Reading "calendar_dates.txt".
INFO:root:The busiest date/s of this feed or your selected date range is/are:  ['2024-02-23'] with 854144 trips.
INFO:root:In the case that more than one busiest date was found, the first one will be considered.
INFO:root:In this case is 2024-02-23.
INFO:root:Reading "stop_times.txt".
INFO:root:_trips is defined in stop_times
INFO:root:Reading "stops.txt".
INFO:root:computing patterns
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_795641/934594691.py in ?()
----> 1 line_freq = feed.lines_freq
      2 line_freq.head()

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    224     @property
    225     def lines_freq(self):
    226         if self._lines_freq is None:
--> 227             self._lines_freq = self.get_lines_freq()
    228 
    229         return self._lines_freq

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    786         Returns the bus frequency in minutes/bus broken down by
    787         time window.
    788         """
    789 
--> 790         stop_times = self.stop_times
    791         shapes = self.shapes
    792         cutoffs = self.time_windows
    793 

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    203     @property
    204     def stop_times(self):
    205         if self._stop_times is None:
--> 206             self._stop_times = self.get_stop_times()
    207 
    208         return self._stop_times

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    675             logging.info('_trips is defined in stop_times')
    676             trips = self._trips
    677         else:
    678             logging.info('get trips in stop_times')
--> 679             trips = self.trips
    680         stops = self.stops
    681 
    682         # Fix data types

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self)
    175         if self._trips is None:
    176             self._trips = self.get_trips()
    177 
    178         if self._patterns and self._trips_patterns is None:
--> 179             (trips_patterns, routes_patterns) = self.get_routes_patterns(
    180                     self._trips)
    181             self._trips_patterns = trips_patterns
    182             self._routes_patterns = routes_patterns

~/anaconda3/envs/gendev/lib/python3.10/site-packages/gtfs_functions/gtfs_functions.py in ?(self, trips)
    391         def version_hash(x):
    392             hash = hashlib.sha1(f"{x.route_id}{x.direction_id}{str(x.zipped_stops)}".encode("UTF-8")).hexdigest()
    393             return hash[:18]
    394 
--> 395         trips_with_stops['pattern_id'] = trips_with_stops.apply(
    396             version_hash, axis=1)
    397 
    398         # Count number of trips per pattern to identify the main one

~/anaconda3/envs/gendev/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, key, value)
   4285             self._setitem_frame(key, value)
   4286         elif isinstance(key, (Series, np.ndarray, list, Index)):
   4287             self._setitem_array(key, value)
   4288         elif isinstance(value, DataFrame):
-> 4289             self._set_item_frame_value(key, value)
   4290         elif (
   4291             is_list_like(value)
   4292             and not self.columns.is_unique

~/anaconda3/envs/gendev/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, key, value)
   4443 
   4444             return self.isetitem(locs, value)
   4445 
   4446         if len(value.columns) > 1:
-> 4447             raise ValueError(
   4448                 "Cannot set a DataFrame with multiple columns to the single "
   4449                 f"column {key}"
   4450             )

ValueError: Cannot set a DataFrame with multiple columns to the single column pattern_id

Am I doing anything wrong?

>>> import pandas as pd
>>> pd.__version__
'2.2.0'

Also, regarding the warning note in the README, I looked into the stop_times.txt:

>>> stop_times['arrival_time'].isna().any()
False

and, similarly:

>>> stop_times['departure_time'].isna().any()
False

Any help is appreciated as this library looks extremely promising.