OpenTransitTools / gtfsdb

GTFS ORM using SQLAlchemy
Mozilla Public License 2.0
160 stars 45 forks source link

Error importing GTFS #9

Closed russellhoff closed 8 years ago

russellhoff commented 8 years ago

I'm trying to import this file (ftp://ftp.geo.euskadi.net/cartografia/Transporte/Moveuskadi/ATTG/lurraldebus_ekialdebus/google_transit.zip) into my database, but after executing the command

gtfsdb-load --database_url postgresql://postgres@ip:5432/lurraldebus_ekialdebus --is_geospatial google_transit.zip

I get the following error:

12:55:15,103 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: route_type 12:55:15,492 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: route_filters 12:55:15,810 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: feed_info 12:55:15,920 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: agency 12:55:16,110 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: calendar 12:55:16,318 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: calendar_dates 12:55:16,535 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: routes 12:55:16,985 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: route_directions 12:55:17,394 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: stops 12:55:17,722 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: stop_features 12:55:18,005 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: transfers 12:55:18,188 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: shapes 12:55:18,521 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: patterns 12:55:18,755 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: trips 12:55:19,130 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: stop_times 12:55:19,516 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: route_stops 12:55:19,925 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: frequencies 12:55:20,008 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: fare_attributes 12:55:20,108 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: fare_rules 12:55:20,341 DEBUG [/home/joninazio/automatizar/gtfsdb/gtfsdb/model/db.pyc] create table: universal_calendar 12:55:20,554 DEBUG [gtfsdb.model.gtfs] GTFS.load: /home/joninazio/descargas_automatizacion/lurraldebus_ekialdebus/20151105131037/google_transit.zip 12:55:20,701 DEBUG [gtfsdb.model.route] RouteType.load (0 seconds) 12:55:20,718 DEBUG [gtfsdb.model.route] RouteFilter.load (0 seconds) 12:55:20,718 DEBUG [gtfsdb.model.feed_info] FeedInfo.load (0 seconds) 12:55:20,733 DEBUG [gtfsdb.model.agency] Agency.load (0 seconds) 12:55:20,734 DEBUG [gtfsdb.model.calendar] Calendar.load (0 seconds) ***_12:56:19,349 DEBUG [gtfsdb.model.calendar] CalendarDate.load (59 seconds) 12:56:19,372 DEBUG [gtfsdb.model.route] Route.load (0 seconds) 12:56:19,372 DEBUG [gtfsdb.model.route] RouteDirection.load (0 seconds) 12:56:19,489 DEBUG [gtfsdb.model.stop] Stop.load (0 seconds) 12:56:19,489 DEBUG [gtfsdb.model.stop_feature] StopFeature.load (0 seconds) 12:56:19,489 DEBUG [gtfsdb.model.transfer] Transfer.load (0 seconds) *_****_12:57:08,222 DEBUG [gtfsdb.model.shape] Shape.load (49 seconds) 12:57:33,950 DEBUG [gtfsdb.model.shape] Pattern.load (26 seconds) 12:57:34,563 DEBUG [gtfsdb.model.trip] Trip.load (1 seconds) _Traceback (most recent call last): File "/home/joninazio/automatizar/gtfsdb/bin/gtfsdb-load", line 13, in sys.exit(gtfsdb.scripts.gtfsdb_load()) File "/home/joninazio/automatizar/gtfsdb/gtfsdb/scripts.py", line 10, in gtfsdb_load database_load(args.file, _kwargs) File "/home/joninazio/automatizar/gtfsdb/gtfsdb/api.py", line 19, in database_load gtfs.load(db, _kwargs) File "/home/joninazio/automatizar/gtfsdb/gtfsdb/model/gtfs.py", line 34, in load cls.load(db, _load_kwargs) File "/home/joninazio/automatizar/gtfsdb/gtfsdb/model/base.py", line 143, in load db.engine.execute(table.insert(), records) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 1991, in execute return connection.execute(statement, multiparams, *_params) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/util/compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/base.py", line 1116, in _execute_context context) File "/home/joninazio/automatizar/gtfsdb/eggs/SQLAlchemy-1.0.9-py2.7-linux-i686.egg/sqlalchemy/engine/default.py", line 447, in do_executemany cursor.executemany(statement, parameters) sqlalchemy.exc.DataError: (psycopg2.DataError) value too long for type character varying(8) [SQL: 'INSERT INTO stop_times (trip_id, arrival_time, departure_time, stop_id, stop_sequence, stop_headsign, pickup_type, drop_off_type, shape_dist_traveled, timepoint) VALUES (%(trip_id)s, %(arrival_time)s, %(departure_time)s, %(stop_id)s, %(stop_sequence)s, %(stop_headsign)s, %(pickup_type)s, %(drop_off_type)s, %(shape_dist_traveled)s, %(timepoint)s)'] [parameters: ({'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '2403.31', 'timepoint': False, 'arrival_time': '09:35:18', 'stop_sequence': '3', 'stop_id': '1446', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:35:18'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '7386.8', 'timepoint': False, 'arrival_time': '09:42:15', 'stop_sequence': '4', 'stop_id': '1469', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:42:15'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '15794.98', 'timepoint': False, 'arrival_time': '09:47:28', 'stop_sequence': '5', 'stop_id': '1396', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:47:28'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '19868.86', 'timepoint': False, 'arrival_time': '09:50:50', 'stop_sequence': '6', 'stop_id': '1397', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:50:50'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '20203.69', 'timepoint': False, 'arrival_time': '09:51:32', 'stop_sequence': '7', 'stop_id': '1492', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:51:32'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '20593.2', 'timepoint': False, 'arrival_time': '09:52:21', 'stop_sequence': '8', 'stop_id': '1493', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:52:21'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '21100.25', 'timepoint': False, 'arrival_time': '09:53:24', 'stop_sequence': '9', 'stop_id': '1494', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:53:24'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '21462.81', 'timepoint': False, 'arrival_time': '09:54:09', 'stop_sequence': '10', 'stop_id': '1495', 'drop_off_type': '1', 'trip_id': '4840', 'departure_time': '09:54:09'} ... displaying 10 of 8536 total bound parameter sets ... {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '7274.36', 'timepoint': False, 'arrival_time': '17:43:17', 'stop_sequence': '13', 'stop_id': '1478', 'drop_off_type': '0', 'trip_id': '38991', 'departure_time': '17:43:17'}, {'pickup_type': '0', 'stop_headsign': None, 'shape_dist_traveled': '7768.13', 'timepoint': False, 'arrival_time': '17:44:45', 'stop_sequence': '14', 'stop_id': '1479', 'drop_off_type': '0', 'trip_id': '38991', 'departure_time': '17:44:45'})]

Is there something wrong with the script, or it the gtfs file malformed?

Thank you !

fpurcell commented 8 years ago

Do you know if the feed passes Google's validator https://developers.google.com/transit/tools?hl=en ?

Judging by the error ("sqlalchemy.exc.DataError: (psycopg2.DataError) value too long for type character varying(8) ... [SQL: INSERT INTO stop_times...") it appears that either an arrival_time or departure_time value is not formatted as 'hh:mm:ss' (e.g., longer than 8 characters). Could be something as minor as an inadvertent space in one of the stop time strings, or possibly an encoding issue where a multi-byte colon character is used as opposed to the ascii colon character.

russellhoff commented 8 years ago

I've executed google's validator and the result is attached here (rename the file from png to html) validation-results

russellhoff commented 8 years ago

Looking at the error in stop_times.txt, I've found that several arrival_time and departure_time doesn't comply with your column data structure:

4924,72:00:03,72:00:03,1511,16,,0,0,8258.94 4924,120:01:07,120:01:07,1510,17,,0,0,8839.82 4924,168:02:27,168:02:27,1426,18,,0,0,9421.59 4924,216:05:00,216:05:00,1427,19,,0,0,10578.43 4924,264:06:26,264:06:26,1509,20,,0,0,11068.95 4924,312:07:32,312:07:32,1496,21,,0,0,11481.56 4924,360:08:22,360:08:22,1528,22,,0,0,12304.17 4924,408:09:16,408:09:16,1497,23,,1,0,13006.8 4924,456:09:50,456:09:50,1498,24,,1,0,13356.67 4924,504:11:41,504:11:41,1437,25,,1,0,13962.78 4924,552:13:33,552:13:33,1438,26,,1,0,14958.28 4924,600:14:09,600:14:09,1525,27,,1,0,15435.61 4924,648:14:33,648:14:33,1428,28,,1,0,15790.56 4924,696:15:53,696:15:53,1526,29,,1,0,16418.86 4924,744:16:53,744:16:53,1501,30,,1,0,17144.16 4925,05:15:00,05:15:00,1501,0,,0,1,0

There are several lines whose hours are 3 digit-length... This case is quite strange, but can happen. Would it be supported in your application?

fpurcell commented 8 years ago

3-digit hours seems to break a strict reading of the GTFS spec, ala "Times must be eight digits in HH:MM:SS format". https://developers.google.com/transit/gtfs/reference?hl=en#stop_timestxt

I'm going to discuss it with Mike later this week, and will let you know. In the meantime, stop_times.py could be edited on your end to make: arrival_time = Column(String(9)) departure_time = Column(String(9), index=True). I'm not sure how amicable we'll be to changing the code on our end...

Take care, Frank

russellhoff commented 8 years ago

Sorry for my delay. There seem to be some issue with the GTFS I mentioned you. We have reported to our authorities, and this doesn't seem to be a fault on your project (would it be supported the existence of trips lasting more that two-digit hours? Doesn't seem feasible...).