Open garrettr opened 8 years ago
@garrettr Are you using 1.0.2 (which is what pulse.cio.gov uses), or 3.X (which is what the comment you link to refers to)? It looks like you're using 3.X, which I've not personally thoroughly tested or deployed.
1.0.2 uses the same documented rule, that it only considers the canonical endpoint: https://github.com/benbalter/site-inspector/blob/erics-mode/lib/site-inspector.rb#L346-L358
But as you demonstrated, nytimes.com's canonical endpoint redirects to HTTP, so it should be marked as downgrading HTTPS either way. And in 1.0.2, it does appear to do so:
$ site-inspector nytimes.com --http
Yields, among the JSON results:
"downgrade_https": true,
This seems to be a regression in 3.X.
I do think the definition of "downgrades HTTPS", when applied to a "domain", should only consider the canonical endpoint. If the NYT's canonical endpoint was https://www.nytimes.com and it didn't redirect down to HTTP, but for some reason https://nytimes.com redirected down to http://nytimes.com (perhaps as some intermediate redirect to the www version), then I would want to say that the nytimes.com domain "enforces HTTPS", even though it's possible to access it in a non-canonical way over HTTP.
@konklone We were trying to use 3.x, but it looks like we'll have to take your advice and use 1.0.2 since we need a useful interpretation of "downgrades HTTPS".
To be clear, the 3.x branch is the latest release, and is the only one currently supported. I'd be glad to get a fix together for the 3.x branch if there is a regression or should be a change in behavior.
@garrettr, @konklone what is the expected/preferred behavior here? Logically, if an endpoint redirects from HTTPS to HTTP, it by definition, wouldn't be the canonical endpoint. Perhaps the definition should be if any endpoint redirects from HTTPS to HTTP? Alternatively, we could only look if the HTTPS equivalent of the canonical domain redirects to HTTP.
What if it redirects from HTTPS to HTTPS, but with an HTTP intermediary? E.g.:
https://nytimes.com
redirects to http://www.nytimes.com
http://www.nytime.com
redirects to https://www.nytimes.com
Edit Wrong @garrettr
@benbalter wrong person beginning with g :) I think you meant @garrettr
Eek. Thanks @garethr. Sorry for the noise.
For reference, DHS has begun a Python-based tool that tackles only this issue, of analyzing HTTPS behavior on domains to come up with the same conclusions that are currently on pulse.cio.gov:
https://github.com/dhs-ncats/pshtt
Our team at 18F is contributing, and hoping to end up using the same toolchain as DHS. The code isn't ready for use in Pulse yet, but it's seeing active work and should hopefully get there soon.
Some starting work on integrating to domain-scan
is here:
https://github.com/18F/domain-scan/pull/76
But as I say in it, there's still some more work to go before they're at feature and logical parity.
the same conclusions that are currently on pulse.cio.gov
@konklone could I trouble you to elaborate on those, in reference to the above question (regarding when HTTPS is downgraded)?
what is the expected/preferred behavior here?
I'm not sure if I fully understand how site-inspector determines the "canonical" endpoint. I think that if a site has an HTTPS endpoint, but redirects it to HTTP, then it should be considered to "downgrade HTTPS".
What if it redirects from HTTPS to HTTPS, but with an HTTP intermediary?
Any redirect from HTTPS to HTTP is problematic for security, even if it eventually re-redirects back to HTTPS. I would say that if a site ever redirects an HTTPS endpoint to an HTTP endpoint, it should be considered to "downgrade HTTPS".
Is that clear @benbalter?
if a site has an HTTPS endpoint, but redirects it to HTTP, then it should be considered to "downgrade HTTPS".
👍
Our methodology is slightly different -- we consider a site as downgrading HTTPS if its "canonical" endpoint downgrades to HTTP: https://github.com/dhs-ncats/pshtt/blob/master/pshtt.py#L541-L563
@konklone See above:
if an endpoint redirects from HTTPS to HTTP, it by definition, wouldn't be the canonical endpoint.
Sorry, I meant if its canonical hostname redirects from HTTPS to HTTP.
As described in https://github.com/benbalter/site-inspector/issues/83#issuecomment-219479551:
This is not a useful definition of "downgrades HTTPS". Any site that supports HTTPS but redirects to HTTP should be considered to "downgrade HTTPS".
For example, take https://nytimes.com:
Here, the "canonical" endpoint is
http://www.nytimes.com
. Requests tohttps://{www.,}nytimes.com
always redirect tohttp://www.nytimes.com
. Clearly the site downgrades HTTPS. However: