MobilityData / gtfs-validator

Canonical GTFS Validator project for schedule (static) files.
https://gtfs-validator.mobilitydata.org/
Apache License 2.0
288 stars 101 forks source link

"load from url" fails with Granicus Website #1813

Open vevetron opened 2 months ago

vevetron commented 2 months ago

Describe the bug

Attempts to validated: Calabasas but get "Error Processing Report".

I don't see anything in the inspection that suggests an error.

But I think what could be happening, Granicus probably blocks requests from cloud servers. So we put in the url, mobility-data server puts in a request for the file, it gets blocked, and we get an error.

It's okay if you download the file and upload it directly.

Steps/Code to Reproduce

Go here: https://gtfs-validator.mobilitydata.org/

Put in this url: Calabasas to "Load from a URL"

Expected Results

Should process the gtfs

Actual Results

"Error Processing Result"

Screenshots

No response

Files used

No response

Validator version

Can't tell - 9/9/2024 version

Operating system

Windows - Chrome

Java version

No response

Additional notes

No response

welcome[bot] commented 2 months ago

Thanks for opening your first issue in this project! If you haven't already, you can join our slack and join the #gtfs-validators channel to meet our awesome community. Come say hi :wave:!

Welcome to the community and thank you for your engagement in open source! :tada:

emmambd commented 2 months ago

Hi @vevetron - thanks for flagging this! We tested this and saw this notice from the Granicus website in our logs:

Access Denied You don't have permission to access "http://www.cityofcalabasas.com/home/showpublisheddocument/31620/638611519891730000" on this server. Reference #18.9369dc17.1725907184.2f7ad1eb https://errors.edgesuite.net/18.9369dc17.1725907184.2f7ad1eb

We suspect this may because our user agent is blocked by the website. The user agent we provide is shared here.

We'd suggest troubleshooting this on the Granicus website to verify if this is the issue.

Let us know if there's anything else we can do to support with this problem.

vevetron commented 2 months ago

We saw something similar since we made requests from Google Cloud. Options:

qcdyx commented 1 month ago

I tested the URL, and it works in a browser, but the curl command fails to download the ZIP file because the required headers, including User-Agent and sec-ch-ua, are missing. @emmambd We can do better error handling as part of the solution to this bug and ask the engagement team to contact Granicus.

curl 'https://www.cityofcalabasas.com/home/showpublisheddocument/31620/638611519891730000' \ -H 'accept-language: en-US,en;q=0.9' \ -H 'priority: u=0, i' \ -H 'sec-ch-ua: "Chromium";v="130", "Google Chrome";v="130", "Not?A_Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-ch-ua-platform: "macOS"' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36' \ -O

qcdyx commented 1 month ago

Hi @vevetron, could you please reach out to Granicus and request them to whitelist MobilityData's header for GTFS validation? The header to whitelist is: user-agent: MobilityData GTFS-Validator/6.0.0 (Java 17.0.6). Please make sure it matches the correct GTFS validation version (You can check the validation version in the report).

vevetron commented 1 month ago

I sent a first email to Granicus and cc'd @qcdyx on it. We can try. Y'all would probably need to set up a custom header that we keep secret. Also since each time java or the validator's version changed the header would change.

I had emailed Calabasus website team previously and they never got back to me, so we might want to find an easier target. I'm guessing all the granicus websites will be blocked from MobilityData.

qcdyx commented 1 month ago

Hey @vevetron Thanks for the update! Please also note that the current version is 5.0.1, which can be found in the validation report. https://gtfs-validator.mobilitydata.org/ Image Please also cc @emmambd and @davidgamez on your emails for visibility. I agree that Granicus websites could be blocked from MobilityData, I'll explore this further with the team and get back to you.

vevetron commented 1 month ago

I wonder what the 6.0.0 means.

Here are some of the CAS agencies we had trouble with: City of Tracy | Granicus City of West Hollywood | Granicus City of Torrance | Likely Granicus City of Glendale | Granicus City of Lompoc | Granicus City of Glendora | Granicus

City of Inglewood | Civic Plus

I tested Glendora through MobilityData validator and it also failed, so I'm guessing the rest will as well.

qcdyx commented 1 month ago

Hey @vevetron Thanks for pointing that out! The "6.0.0" is actually a placeholder for the version we're currently working on for the GTFS Validator's next release. The current public version is 5.0.1, as I mentioned. I included 6.0.0 in the whitelist request to future-proof it for when the new release goes live. For now, we can proceed with the request using 5.0.1, and once 6.0.0 is released, we can update it if needed.

I tested the City of Tracy's URL (http://data.trilliumtransit.com/gtfs/tracy-ca-us/tracy-ca-us.zip) from the MobilityDatabase (https://mobilitydatabase.org/feeds/mdb-877), and it is working.Image

The URL of City of Glendora (https://raw.githubusercontent.com/LACMTA/los-angeles-regional-gtfs/main/glendora-ca-us/glendora-ca-us.zip) found on the MobilityDatabase https://mobilitydatabase.org/feeds/mdb-609 gives me a 404 when I tried it in browser. Image

Please continue testing using the URLs on MobilityDatase https://mobilitydatabase.org/ for the other cities.

vevetron commented 1 month ago

Looks like Tracy hosts their gtfs in two places- this is the one new one we have been using that fails without a firewall exception.

vevetron commented 3 weeks ago

Question - does MobilityData use a stable IP address when downloading gtfs?

  1. A user goes to https://gtfs-validator.mobilitydata.org/ and adds a url to "load from a url"
  2. MobilityData servers seek out that gtfs from say, https://www.cityoftracy.org/home/showpublisheddocument/16626/638342536313270000 --- Is MobilityData's server ip static? Or does it change with each request?

If the ip address is static, it would be easier to get a firewall passthrough approved for Granicus rather than getting the user-agent whitelisted. (For CAL-ITP, our ips in this case are ephemeral).

vevetron commented 3 weeks ago

David says the servers run in the cloud and don't have stable ip addresses.

davidgamez commented 3 weeks ago

Hi @vevetron, yes unfortunately we don't have a static IP that producers can rely on. As a follow up, we will work in the different branches of this issue.

We are looking at having this implemented by the next release.

vevetron commented 1 day ago

Hi!

Can someone from your team test the new granicus auth keys from your cloud server? Refer to the emails for the code.