GSA-TTS / tts-tech-operations

Home of the TTS Technology Portfolio team
https://handbook.tts.gsa.gov/tech-operations/
Other
5 stars 0 forks source link

indicate whether subdomains are live or not #394

Closed afeld closed 4 years ago

afeld commented 4 years ago

Background information

The Digital Council is going to be doing a data call for each website at TTS. We are going to be working off of this list of subdomains, which at 426 (at present) is way too many to deal with. Many of those sites have been transferred and/or are no longer up, and the Status column isn't accurate.

This should be automated so that we can do a better job checking for it on an ongoing basis. The latter isn't a requirement for this issue.

cc https://github.com/18F/tts-tech-portfolio/issues/363 https://github.com/18F/dns/issues/412

User stories

Implementation

Acceptance criteria

afeld commented 4 years ago

Conversation about combining efforts.

timothy-spencer commented 4 years ago

OK. I got your list of sites working in my sandbox, and you can find all the sites which were unreachable (dns problem, behind a firewall, host is not alive, etc: anything were we did not get back an http status code) with this: curl -s https://scanner-ui-reliable-kob-bb.app.cloud.gov/api/v1/scans/dap/?data.status_code=lt:0 | jq -r '.[] | .domain'

The code is in production, but the scan hasn't been run yet on your new domains. Starting tomorrow, you can use the real URL: curl -s https://site-scanning.app.cloud.gov/api/v1/scans/dap/?data.status_code=lt:0 | jq -r '.[] | .domain'

Check it out and let me know if this works for you!

timothy-spencer commented 4 years ago

Another query that might be interesting are non 200 responses, which can be found with this: curl -s https://scanner-ui-reliable-kob-bb.app.cloud.gov/api/v1/scans/dap/?data.status_code=gt:200 | jq -r '.[] | (.data.status_code | tostring) + " " + .domain'

Again, tomorrow, you will be able to do this against the production site rather than my sandbox: curl -s https://site-scanning.app.cloud.gov/api/v1/scans/dap/?data.status_code=gt:200 | jq -r '.[] | (.data.status_code | tostring) + " " + .domain'

afeld commented 4 years ago

Results from Synthetics.

timothy-spencer commented 4 years ago

One thing to note about those API queries is that because the subdomain list is large, we can no longer support returning the full list of scans. I've had to require pagination for queries like this, so the queries above won't work anymore.

It shouldn't be hard to adapt one of the scripts in https://github.com/18F/site-scanning/tree/master/tools to work for you, though! LMK if you need something like that.

afeld commented 4 years ago

Based on the Synthetics data, there were only four or five in the spreadsheet whose System Status was incorrectly labeled. Fixed!