OneBusAway / onebusaway-gtfs-modules

A Java-based library for reading, writing, and transforming public transit data in the GTFS format, including database support.
Other
129 stars 106 forks source link

Unable to load gtfs with missing required fields to fix it using the transformer-cli #112

Open robbi5 opened 5 years ago

robbi5 commented 5 years ago

Summary:

I have a broken GTFS file with missing coordinates (stop_lat, stop_lon) on some stops in the stops.txt. To fix it, I wanted to use the transformer-cli with the update operation. I've hoped that I could simply add them by using the transformer-cli with rules like this one:

{"op":"update", "match":{"file":"stops.txt", "stop_id":"de:08237:4004"}, "update":{"stop_lat":"48.474133", "stop_lon":"8.488702"}}

Sadly this failes, because transformer-cli aborts with an MissingRequiredFieldException: missing required field: stop_lat

Steps to reproduce:

GTFS from http://ubahndepot.com/storage/opendata/gtfs-scrapes/vgc-calw/gtfs-vgc-calw-2019-03-07--2019-06-08.zip as vgc.zip

rules file vgc.rule:

## Fix missing lat/lon:
# de:08237:4004,Dornstetten Hochgericht,,,,nses/7210214_26.json
# de:08237:4506,Pfalzgrafenweiler Sportanlagen,,,,nses/7210214_26.json
# de:08237:4645,Bösingen Ortsausgang,,,,nses/7210214_26.json
# de:08237:5122,Eutingen im Gäu Daimlerstraße,,,,nses/7210327_74.json
{"op":"update", "match":{"file":"stops.txt", "stop_id":"de:08237:4004"}, "update":{"stop_lat":"48.474133", "stop_lon":"8.488702"}}
{"op":"update", "match":{"file":"stops.txt", "stop_id":"de:08237:4506"}, "update":{"stop_lat":"48.530459", "stop_lon":"8.577677"}}
{"op":"update", "match":{"file":"stops.txt", "stop_id":"de:08237:4645"}, "update":{"stop_lat":"48.538909", "stop_lon":"8.596061"}}
{"op":"update", "match":{"file":"stops.txt", "stop_id":"de:08237:5122"}, "update":{"stop_lat":"48.47754", "stop_lon":"8.74097"}}

Command: java -Xmx6g -jar onebusaway-gtfs-transformer-cli.jar --transform=vgc.rule vgc.zip vgc.fixed.zip

Expected behavior:

transformer-cli maybe warns about the missing required fields, but then allows the update operation to happen and returns a GTFS file with filled stop_lat, stop_lon fields.

Observed behavior: transformer-cli doesn't even load the gtfs because stop_lat is a required field:

2019-03-24 10:51:45,959 INFO  [GtfsTransformerMain.java:191] : input path: vgc.zip
2019-03-24 10:51:46,084 INFO  [GtfsTransformerMain.java:196] : output path: vgc.fixed.zip
2019-03-24 10:51:46,091 INFO  [GtfsTransformer.java:192] : reading gtfs from vgc.zip
2019-03-24 10:51:46,092 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Agency
2019-03-24 10:51:46,099 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Block
2019-03-24 10:51:46,099 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.ShapePoint
2019-03-24 10:51:46,099 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Note
2019-03-24 10:51:46,099 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Area
2019-03-24 10:51:46,099 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Route
2019-03-24 10:51:46,125 INFO  [GtfsReader.java:178] : reading entities: org.onebusaway.gtfs.model.Stop
org.onebusaway.csv_entities.exceptions.CsvEntityIOException: io error: entityType=org.onebusaway.gtfs.model.Stop path=stops.txt lineNumber=619
    at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161)
    at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120)
    at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115)
    at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:180)
    at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:168)
    at org.onebusaway.gtfs_transformer.GtfsTransformer.readGtfs(GtfsTransformer.java:194)
    at org.onebusaway.gtfs_transformer.GtfsTransformer.run(GtfsTransformer.java:157)
    at org.onebusaway.gtfs_transformer.GtfsTransformerMain.runApplication(GtfsTransformerMain.java:247)
    at org.onebusaway.gtfs_transformer.GtfsTransformerMain.run(GtfsTransformerMain.java:106)
    at org.onebusaway.gtfs_transformer.GtfsTransformerMain.main(GtfsTransformerMain.java:85)
Caused by: org.onebusaway.csv_entities.exceptions.MissingRequiredFieldException: missing required field: stop_lat
    at org.onebusaway.csv_entities.schema.AbstractFieldMapping.isMissingAndOptional(AbstractFieldMapping.java:100)
    at org.onebusaway.csv_entities.schema.DefaultFieldMapping.translateFromCSVToObject(DefaultFieldMapping.java:43)
    at org.onebusaway.csv_entities.IndividualCsvEntityReader.readEntity(IndividualCsvEntityReader.java:131)
    at org.onebusaway.csv_entities.IndividualCsvEntityReader.handleLine(IndividualCsvEntityReader.java:98)
    at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:157)
    ... 9 more

Platform:

For reproducibility reasons I've tested it with onebusaway-gtfs-transformer-cli v1.3.63 and java 1.8.0_202 on macOS 10.14.

java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)

Normally, I'm using the HSL digitransit stack, namely the OpenTripPlanner-data-container. This uses the docker base container openjdk:8-jre and onebusaway-gtfs-transformer-cli v1.3.9:

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)