MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
278 stars 100 forks source link

CI - SEVERE: Cannot load GTFS feed - error in opening zip file #768

Closed barbeau closed 3 years ago

barbeau commented 3 years ago

Describe the bug I was just looking at the executions on CI - for example, the run on 100 feeds here: https://github.com/MobilityData/gtfs-validator/runs/1963330729?check_suite_focus=true

From a glance it looks like the executions against all the feeds are failing with:

Feb 23, 2021 5:23:14 PM org.mobilitydata.gtfsvalidator.cli.Main main
SEVERE: Cannot load GTFS feed
java.util.zip.ZipException: error in opening zip file
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:230)
    at java.util.zip.ZipFile.<init>(ZipFile.java:160)
    at java.util.zip.ZipFile.<init>(ZipFile.java:174)
    at org.mobilitydata.gtfsvalidator.input.GtfsZipFileInput.<init>(GtfsZipFileInput.java:37)
    at org.mobilitydata.gtfsvalidator.input.GtfsInput.createFromPath(GtfsInput.java:61)
    at org.mobilitydata.gtfsvalidator.input.GtfsInput.createFromUrl(GtfsInput.java:83)
    at org.mobilitydata.gtfsvalidator.cli.Main.main(Main.java:78)

..but the CI is showing a successful run.

We should fix this to ensure the feeds are downloaded and validated, and also fix CI to make sure when this type of error occurs the CI run fails.

How we reproduce the bug

  1. Look at CI execution https://github.com/MobilityData/gtfs-validator/runs/1963330729?check_suite_focus=true

Expected behaviour All feeds should be successfully downloaded and validated

When there is a failure like SEVERE: Cannot load GTFS feed, the CI build should fail and not pass.

Observed behaviour Feed parsing is failing with the above error, but the CI build is green saying it passed.

barbeau commented 3 years ago

Looks like CI was working on the merge for the fix for StopTooFarFromTripShape (https://github.com/MobilityData/gtfs-validator/commit/9d9b25f494f6a04b593d5607eecc7e4b42e6fa96) - this execution downloaded and successfully ran on all feeds: https://github.com/MobilityData/gtfs-validator/runs/1962427135

So maybe it was the change to Java 8 that broke something?

lionel-nj commented 3 years ago

From a quick look, it seems that this behavior is witnessed after the downgrade to Java 8 https://github.com/MobilityData/gtfs-validator/actions/runs/593048170 (from https://github.com/MobilityData/gtfs-validator/pull/759). Investigating further.

lionel-nj commented 3 years ago

The end_to_end workflow seems to be working https://github.com/MobilityData/gtfs-validator/runs/1963330711?check_suite_focus=true (even after merging https://github.com/MobilityData/gtfs-validator/pull/759 and #750). Will take a look at both files to see if there is a difference that could explain this behavior.

lionel-nj commented 3 years ago

By curiosity I successfully re-ran the same job. In the first run, no datasets could be extracted as .zip. Could this be due to a temporary failure from transitfeeds now openmobilitydata? See execution: https://github.com/MobilityData/gtfs-validator/runs/1963330729?check_suite_focus=true

Capture d’écran, le 2021-02-23 à 13 36 51

I also notice that in this workflow we use the latest version of datasets. For consistency, should we use fixed versions of datasets?

[...] fix CI to make sure when this type of error occurs the CI run fails. On it!

barbeau commented 3 years ago

When I look at https://github.com/MobilityData/gtfs-validator/runs/1963330729?check_suite_focus=true, I still see the same failure - for example, under Ruter for Norway:

Run java -jar main/build/libs/*.jar --url http://transitfeeds.com/p/ruter/240/latest/download --output_base output --feed_name no-ruter --storage_directory ruter.zip
Feb 23, 2021 5:23:02 PM org.mobilitydata.gtfsvalidator.cli.Main main
Feed name: no-ruter
SEVERE: Cannot load GTFS feed
java.util.zip.ZipException: error in opening zip file
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:230)
    at java.util.zip.ZipFile.<init>(ZipFile.java:160)
    at java.util.zip.ZipFile.<init>(ZipFile.java:174)
    at org.mobilitydata.gtfsvalidator.input.GtfsZipFileInput.<init>(GtfsZipFileInput.java:37)
    at org.mobilitydata.gtfsvalidator.input.GtfsInput.createFromPath(GtfsInput.java:61)
    at org.mobilitydata.gtfsvalidator.input.GtfsInput.createFromUrl(GtfsInput.java:83)
    at org.mobilitydata.gtfsvalidator.cli.Main.main(Main.java:78)

It would help if we logged network failures in a way that could be identified separately from zip file errors - I don't think we do now? If we aren't, it could be a network error, although I can use the same URL on my machine and it works - http://transitfeeds.com/p/ruter/240/latest/download.

I also notice that in this workflow we use the latest version of datasets. For consistency, should we use fixed versions of datasets?

For now I think it's fine to always run against the latest versions of feeds for these tests. This will help us catch problems as the datasets evolve. When we're explicitly checking the validator output for feeds we should use a fixed version so the number of errors and warnings doesn't change.

lionel-nj commented 3 years ago

When I look at https://github.com/MobilityData/gtfs-validator/runs/1963330729?check_suite_focus=true, I still see the same failure - for example, under Ruter for Norway:

My bad, the workflow log can be consulted here: https://github.com/MobilityData/gtfs-validator/runs/1963851590?check_suite_focus=true

It would help if we logged network failures in a way that could be identified separately from zip file errors -

Definitely yes.

I don't think we do now? If we aren't, it could be a network error,

No we do not. A network error would be my best guess of what happened there. On the model of what we do when a GTFS file could not be parsed, we could print an error message specifying that a network error happened. This would require to check the response code of the HTTP request

https://github.com/MobilityData/gtfs-validator/blob/dc33783aac7536b17acb3a177f43ebbea2f204eb/core/src/main/java/org/mobilitydata/gtfsvalidator/input/GtfsInput.java#L114

For now I think it's fine to always run against the latest versions of feeds for these tests. This will help us catch problems as the datasets evolve. When we're explicitly checking the validator output for feeds we should use a fixed version so the number of errors and warnings doesn't change.

👍🏾

barbeau commented 3 years ago

Ok - if this was a one-off IO error, and given that these errors are added to the notice container, let's close this issue and we'll deal with it as part of https://github.com/MobilityData/gtfs-validator/issues/734 when we start examining the output of the validator. We can check for the presence of any error notices there and halt CI if we see them.