benbalter / site-inspector

Ruby Gem to sniff information about a domain's technology and capabilities.
https://site-inspector.herokuapp.com
MIT License
89 stars 29 forks source link

downgrades_https logic doesn't make sense #84

Open garrettr opened 8 years ago

garrettr commented 8 years ago

As described in https://github.com/benbalter/site-inspector/issues/83#issuecomment-219479551:

A site is only downgraded from HTTPS to HTTP when HTTPS is supported, but the canonical endpoint downgrades to HTTP.

This is not a useful definition of "downgrades HTTPS". Any site that supports HTTPS but redirects to HTTP should be considered to "downgrade HTTPS".

For example, take https://nytimes.com:

~> curl -v https://nytimes.com
* Rebuilt URL to: https://nytimes.com/
*   Trying 170.149.159.130...
* Connected to nytimes.com (170.149.159.130) port 443 (#0)
<snip>
> GET / HTTP/1.1
> Host: nytimes.com
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Server: Varnish
< Location: http://www.nytimes.com/
<snip>

Here, the "canonical" endpoint is http://www.nytimes.com. Requests to https://{www.,}nytimes.com always redirect to http://www.nytimes.com. Clearly the site downgrades HTTPS. However:

$ irb
irb(main):001:0> require 'site-inspector'
=> true
irb(main):002:0> site = SiteInspector.inspect "nytimes.com"
=> #<SiteInspector::Domain host="nytimes.com">
irb(main):003:0> site.https?
=> true
irb(main):004:0> site.downgrades_https?
=> false
konklone commented 8 years ago

@garrettr Are you using 1.0.2 (which is what pulse.cio.gov uses), or 3.X (which is what the comment you link to refers to)? It looks like you're using 3.X, which I've not personally thoroughly tested or deployed.

1.0.2 uses the same documented rule, that it only considers the canonical endpoint: https://github.com/benbalter/site-inspector/blob/erics-mode/lib/site-inspector.rb#L346-L358

But as you demonstrated, nytimes.com's canonical endpoint redirects to HTTP, so it should be marked as downgrading HTTPS either way. And in 1.0.2, it does appear to do so:

$ site-inspector nytimes.com --http

Yields, among the JSON results:

"downgrade_https": true,

This seems to be a regression in 3.X.

I do think the definition of "downgrades HTTPS", when applied to a "domain", should only consider the canonical endpoint. If the NYT's canonical endpoint was https://www.nytimes.com and it didn't redirect down to HTTP, but for some reason https://nytimes.com redirected down to http://nytimes.com (perhaps as some intermediate redirect to the www version), then I would want to say that the nytimes.com domain "enforces HTTPS", even though it's possible to access it in a non-canonical way over HTTP.

garrettr commented 8 years ago

@konklone We were trying to use 3.x, but it looks like we'll have to take your advice and use 1.0.2 since we need a useful interpretation of "downgrades HTTPS".

benbalter commented 8 years ago

To be clear, the 3.x branch is the latest release, and is the only one currently supported. I'd be glad to get a fix together for the 3.x branch if there is a regression or should be a change in behavior.

@garrettr, @konklone what is the expected/preferred behavior here? Logically, if an endpoint redirects from HTTPS to HTTP, it by definition, wouldn't be the canonical endpoint. Perhaps the definition should be if any endpoint redirects from HTTPS to HTTP? Alternatively, we could only look if the HTTPS equivalent of the canonical domain redirects to HTTP.

What if it redirects from HTTPS to HTTPS, but with an HTTP intermediary? E.g.:

  1. https://nytimes.com redirects to http://www.nytimes.com
  2. http://www.nytime.com redirects to https://www.nytimes.com

Edit Wrong @garrettr

garethr commented 8 years ago

@benbalter wrong person beginning with g :) I think you meant @garrettr

benbalter commented 8 years ago

Eek. Thanks @garethr. Sorry for the noise.

konklone commented 8 years ago

For reference, DHS has begun a Python-based tool that tackles only this issue, of analyzing HTTPS behavior on domains to come up with the same conclusions that are currently on pulse.cio.gov:

https://github.com/dhs-ncats/pshtt

Our team at 18F is contributing, and hoping to end up using the same toolchain as DHS. The code isn't ready for use in Pulse yet, but it's seeing active work and should hopefully get there soon.

konklone commented 8 years ago

Some starting work on integrating to domain-scan is here: https://github.com/18F/domain-scan/pull/76

But as I say in it, there's still some more work to go before they're at feature and logical parity.

benbalter commented 8 years ago

the same conclusions that are currently on pulse.cio.gov

@konklone could I trouble you to elaborate on those, in reference to the above question (regarding when HTTPS is downgraded)?

garrettr commented 8 years ago

what is the expected/preferred behavior here?

I'm not sure if I fully understand how site-inspector determines the "canonical" endpoint. I think that if a site has an HTTPS endpoint, but redirects it to HTTP, then it should be considered to "downgrade HTTPS".

What if it redirects from HTTPS to HTTPS, but with an HTTP intermediary?

Any redirect from HTTPS to HTTP is problematic for security, even if it eventually re-redirects back to HTTPS. I would say that if a site ever redirects an HTTPS endpoint to an HTTP endpoint, it should be considered to "downgrade HTTPS".

Is that clear @benbalter?

benbalter commented 8 years ago

if a site has an HTTPS endpoint, but redirects it to HTTP, then it should be considered to "downgrade HTTPS".

👍

konklone commented 8 years ago

Our methodology is slightly different -- we consider a site as downgrading HTTPS if its "canonical" endpoint downgrades to HTTP: https://github.com/dhs-ncats/pshtt/blob/master/pshtt.py#L541-L563

benbalter commented 8 years ago

@konklone See above:

if an endpoint redirects from HTTPS to HTTP, it by definition, wouldn't be the canonical endpoint.

konklone commented 8 years ago

Sorry, I meant if its canonical hostname redirects from HTTPS to HTTP.