MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
275 stars 100 forks source link

Specifying URL from OpenMobilityData BART GTFS --> empty zip #379

Closed e-lo closed 3 years ago

e-lo commented 3 years ago

Describe the bug When I point the url to use the latest BART feed hosted at open mobility data, it leads to an empty zip file.

To Reproduce java -jar gtfs-validator-v1.2.2.jar -u http://transitfeeds.com/p/bart/58/latest/download

Expected behavior The validator to download the zip to a location and be able to unzip it

Note that this could be an issue with OpenMobilityData, but in that case the validator should tell me that

Witnessed behavior Logs that it is downloading and unzipping file, but then it finds it empty.

Environment used

Additional context

Traceback

#  java -jar gtfs-validator-v1.2.2.jar -u http://transitfeeds.com/p/bart/58/latest/download                                            r [WARN ] 2020-09-23 21:11:09.953 [main] Main - Configuration file execution-parameters.json not found in working directory
[INFO ] 2020-09-23 21:11:09.964 [main] Main - Retrieving execution parameters from command-line
[INFO ] 2020-09-23 21:11:10.732 [main] Main - --url provided but no location to place zip (--zip option). Using default: /home/gtfs/input.zip
[INFO ] 2020-09-23 21:11:10.733 [main] Main - --input not provided. Will extract zip content in: /home/gtfs/input
[INFO ] 2020-09-23 21:11:10.734 [main] Main - --output not provided. Will place execution results in: /home/gtfs/output
[INFO ] 2020-09-23 21:11:10.737 [main] Main - Downloading archive
[INFO ] 2020-09-23 21:11:12.800 [main] Main - Unzipping archive
[ERROR] 2020-09-23 21:11:12.803 [main] Main - An exception occurred: java.util.zip.ZipException: zip file is empty
[INFO ] 2020-09-23 21:11:12.805 [main] Main - Took 00h 00m 04s
barbeau commented 3 years ago

Similar to my comment at https://github.com/MobilityData/gtfs-validator/issues/378#issuecomment-697941183, this sounds exactly like a redirect issue and the gtfs-validator not handling them correctly.

For debugging reference - we had a similar issue with the GTFS-realtime validator - see https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/89 for potential code changes to handle correctly.

ghost commented 3 years ago

hi, thks for reporting. We could reproduce the issue. The validator indeed incorrectly handles redirection. While we intend to implement a permanent solution, we can provide a workaround in this specific case which is to use the updated url https://openmobilitydata.org/p/mbta/64/latest/download

ghost commented 3 years ago

This is probably related

[WARN ] 2020-10-20 21:12:00.176 [main] Main - Configuration file execution-parameters.json not found in working directory [INFO ] 2020-10-20 21:12:00.180 [main] Main - Retrieving execution parameters from command-line [INFO ] 2020-10-20 21:12:00.560 [main] Main - Downloading archive [INFO ] 2020-10-20 21:12:13.504 [main] Main - Unzipping archive Error: 2020-10-20 21:12:13.507 [main] Main - Error detected -- ABORTING [INFO ] 2020-10-20 21:12:13.532 [main] Main - Results are exported as JSON by default [INFO ] 2020-10-20 21:12:13.532 [main] Main - Computed relative path for report file: output/__2020-10-20_17-12-13.508212 [INFO ] 2020-10-20 21:12:13.552 [main] Main - Exporting validation repo content:[ Notice{filename='input.zip', level='ERROR', code='8', title='Unzipping error', description='An error occurred while trying to unzip archive: input.zip', extra='{}'}] [INFO ] 2020-10-20 21:12:13.571 [main] Main - Set option -abort_on_error to false for validation process to continue on errors [INFO ] 2020-10-20 21:12:13.572 [main] Main - Took 00h 00m 14s

https://github.com/MobilityData/gtfs-validator/runs/1283395475?check_suite_focus=true

Given we have merged #433 , I now suspect there is something around the cleanup and immediate zip extraction into the input path when the validator is run multiple times with the same configuration. Potential solution to explore: clean the extraction path before downloading the archive instead of doing all at once at zip extraction time

ghost commented 3 years ago

Work started on this issue as demonstrated by the Touch commit @e-lo https://github.com/MobilityData/gtfs-validator/runs/1316365688?check_suite_focus=true

ghost commented 3 years ago

Good news, the workflow execution did not lead to the same error, meaning the Zip issue was fixed, I suspected as much but we didn't clearly identified the source of the issue. I invite you ton consult the dataset and report produced by the validator. image

https://github.com/MobilityData/gtfs-validator/actions/runs/331932433

This time I attach them to the issue as a conveninence :)

validation_report_all.zip bart.zip

Thks again for opening the issue. You'll see on opening new issues that we tried to take your feedback and our interactions and integrate them in a more streamlined process to the benefit of our sponsors and members as well as the larger open source community! ❤️

Also, do not hesitate to open a new issue of type 0. if you see anything funky in the report 😅 https://github.com/MobilityData/gtfs-validator/issues/new/choose

ghost commented 3 years ago

Note we will merge the associated PR to run on BART data on all future contributions! https://github.com/MobilityData/gtfs-validator/pull/462

As soon as the review from @lionel-nj will be in 🎂

ghost commented 3 years ago

Turns out, I used the wrong url and the bug is still present. Keeping this issue open for our next planificaiton