CUTR-at-USF / transit-feed-quality-calculator

A tool that uses the gtfs-realtime-validator to calculate the quality of a large number of GTFS-realtime feeds
Other
7 stars 1 forks source link

Redirects and HTTPS feeds aren't handled correctly #9

Closed barbeau closed 6 years ago

barbeau commented 6 years ago

Summary:

It looks like there are some URLs from TransitFeeds.com that include redirects to HTTPS feeds, which we currently don't seem to handle correctly.

I believe this is the same issue that we saw in the gtfs-realtime-validator in https://github.com/CUTR-at-USF/gtfs-realtime-validator/issues/89, so we can use the same solution there - here's the commit - https://github.com/CUTR-at-USF/gtfs-realtime-validator/commit/180785d22ca58afa2463b322ad4e1b122c5f0a30.

Some examples that throw errors in the tool, but work fine when copy/pasting URLs into browser:

Error reading GTFS-realtime feed 'http://transport.orgp.spb.ru/Portal/transport/internalapi/gtfs/realtime/vehicle' - java.io.IOException: Server returned HTTP response code: 502 for URL: http://transport.orgp.spb.ru/Portal/transport/internalapi/gtfs/realtime/vehicle
Error reading GTFS-realtime feed 'http://rtu.york.ca/gtfsrealtime/VehiclePositions' - java.io.FileNotFoundException: feeds\50-York, Toronto, ON, Canada\YRT\Viva Vehicle Positions-1508964601238.pb (The system cannot find the path specified)
Error reading GTFS-realtime feed 'http://rtu.york.ca/gtfsrealtime/TripUpdates' - java.io.FileNotFoundException: feeds\50-York, Toronto, ON, Canada\YRT\Viva Trip Updates-1508964601530.pb (The system cannot find the path specified)
Error reading GTFS feed 'http://oregon-gtfs.com/gtfs_data/citytocityshuttle-or-us/citytocityshuttle-or-us.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://oregon-gtfs.com/gtfs_data/citytocityshuttle-or-us/citytocityshuttle-or-us.zip
Error reading GTFS feed 'http://trilliumtransit.com/transit_feeds/tidelinewatertaxi-ca-us/tidelinewatertaxi-ca-us.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://trilliumtransit.com/transit_feeds/tidelinewatertaxi-ca-us/tidelinewatertaxi-ca-us.zip
Error reading GTFS feed 'http://oregon-gtfs.com/gtfs_data/eugenetocoosbay-or-us/eugenetocoosbay-or-us.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://oregon-gtfs.com/gtfs_data/eugenetocoosbay-or-us/eugenetocoosbay-or-us.zip
Error reading GTFS feed 'http://oregon-gtfs.com/gtfs_data/pacificcrest-or-us/pacificcrest-or-us.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://oregon-gtfs.com/gtfs_data/pacificcrest-or-us/pacificcrest-or-us.zip
Error reading GTFS feed 'http://oregon-gtfs.com/gtfs_data/cascadespoint-or-us/cascadespoint-or-us.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://oregon-gtfs.com/gtfs_data/cascadespoint-or-us/cascadespoint-or-us.zip
https://www.ripta.com/stuff/contentmgr/files/0/3cda81dfa140edbe9aae214b26245b4a/files/google_transit.zip
Error reading GTFS feed 'http://www.viainfo.net/BusService/google_transit.zip' - java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.viainfo.net/BusService/google_transit.zip

Steps to reproduce:

Run the tool

Expected behavior:

Above feeds should download correctly

Observed behavior:

We get the above errors

Platform:

Windows 7 Enterprise with jdk1.8.0_73

barbeau commented 6 years ago

I'll handle this.

barbeau commented 6 years ago

This needs to be done for both GTFS and GTFS-realtime.

barbeau commented 6 years ago

Based on the GTFS-realtime validator, it looks like MBTA is using a redirect from http to https:

[qtp429804587-21] INFO edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed - Downloading GTFS data from http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip...
[qtp429804587-21] WARN edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed - Redirecting to https://www.mbta.com/uploadedfiles/MBTA_GTFS.zip
[qtp429804587-21] INFO edu.usf.cutr.gtfsrtvalidator.api.resource.GtfsFeed - GTFS data downloaded successfully

URLs: