MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
278 stars 100 forks source link

Parsing failed message, but no details in report.json #858

Closed machow closed 3 years ago

machow commented 3 years ago

Bug report

Hey y'all--this issue might be me mis-understanding when the validator logs a big PARSING FAILED message. I tried to supply everything to replicate / show where the message led me, but apologies if I'm interpreting it wrong!

Describe the bug

When parsing a zip archive with the gtfs-validator, sometimes it displays a message like below:

Apr 19, 2021 7:24:31 PM org.mobilitydata.gtfsvalidator.table.GtfsAgencyTableLoader load
SEVERE: Failed to parse some rows in agency.txt
 ----------------------------------------- 
|       !!!    PARSING FAILED    !!!      |
|   Most validators were never invoked.   |
|   Please see report.json for details.   |
 ----------------------------------------- 
Validation took 3.402 seconds
agency.txt  UNPARSABLE_ROWS

However, in the report.json, there doesn't seem to be an indication of what the parsing failure was. Below is the only section of report.json related to agency.txt.

      {
         "code":"invalid_phone_number",
         "severity":"ERROR",
         "totalNotices":1,
         "notices":[
            {
               "filename":"agency.txt",
               "csvRowNumber":2,
               "fieldName":"agency_phone",
               "fieldValue":"8004117245"
            }
         ]
      },
expand for full report.json ``` { "notices":[ { "code":"decreasing_or_equal_shape_distance", "severity":"ERROR", "totalNotices":2, "notices":[ { "shapeId":"r9ei", "csvRowNumber":255, "shapeDistTraveled":0.0, "shapePtSequence":1, "prevCsvRowNumber":254, "prevShapeDistTraveled":0.0, "prevShapePtSequence":0 }, { "shapeId":"r9ei", "csvRowNumber":256, "shapeDistTraveled":0.0, "shapePtSequence":2, "prevCsvRowNumber":255, "prevShapeDistTraveled":0.0, "prevShapePtSequence":1 } ] }, { "code":"invalid_phone_number", "severity":"ERROR", "totalNotices":1, "notices":[ { "filename":"agency.txt", "csvRowNumber":2, "fieldName":"agency_phone", "fieldValue":"8004117245" } ] }, { "code":"unknown_file", "severity":"INFO", "totalNotices":4, "notices":[ { "filename":"calendar_attributes.txt" }, { "filename":"rider_categories.txt" }, { "filename":"directions.txt" }, { "filename":"farezone_attributes.txt" } ] } ] } ```

How we reproduce the bug

I'll download from a url below, but here is a zipfile just in case: gtfs.zip. (In the code below, replace env var $GTFS_VALIDATOR_JAR with a path to the validator jar file).

java -jar $GTFS_VALIDATOR_JAR -u 'https://transitfeeds.com/p/altamont-corridor-express/823/latest/download' -o output -f na-na

Expected behaviour

An entry in report.json explaining the parse error.

Observed behaviour

An entry in report.json that seems to be unrelated to the parse error (on a misformatted phone number). Note that I'm able to read the data in agency.txt, using the python library pandas, so am not sure how to interpret the parse error.

Environment versions

lionel-nj commented 3 years ago

Hi @machow! Thanks for taking the time to open an issue.

Two things:

  1. It seems like you tried to validate a gtfs dataset from an agency located in the US: in this case your CLI input should be as follows:

java -jar $GTFS_VALIDATOR_JAR -u 'https://transitfeeds.com/p/altamont-corridor-express/823/latest/download' -o output -f us-altamont

instead of

java -jar $GTFS_VALIDATOR_JAR -u 'https://transitfeeds.com/p/altamont-corridor-express/823/latest/download' -o output -f na-na

When you provide na-na as the GtfsFeedName the validator interprets this as a country code na for Namibia (https://laendercode.net/en/2-letter-code/na), hence the big parsing error.

  1. Thanks for flagging this, we are working on deprecating -f (which can be misleading has suggests this issue) and moving towards using -c (as an optional CLI parameter). This should be included in our next release and will help running the validator programatically.
machow commented 3 years ago

Ah, thanks! That makes total sense--from looking at the readme again, it sounds like the first piece of -f is currently used in parsing rules (e.g. for country specific phone numbers, zip code formats?), while the second piece could be set to anything. Is that right?

In any event, thanks again for your help!

lionel-nj commented 3 years ago

Ah, thanks! That makes total sense--from looking at the readme again, it sounds like the first piece of -f is currently used in parsing rules (e.g. for country specific phone numbers, zip code formats?), while the second piece could be set to anything. Is that right?

Exactly!

In any event, thanks again for your help!

Pleasure!