EFForg / https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
https://eff.org/https-everywhere
Other
3.37k stars 1.09k forks source link

Common ruleset bugs we've seen & suggested tests #460

Closed semenko closed 8 years ago

semenko commented 10 years ago

(Expanding on issue #451)

Some examples of common ruleset errors we might be able to test for.

<target host="www.df.eu" /> <------ Missing non-www target host to match rule.
<rule from="^http://(www\.)?df\.eu/" to="https://www.df.eu/"/>
semenko commented 10 years ago

Rules that do nothing:

<rule from="^https://ib24\.csob\.sk/" to="https://ib24.csob.sk/" />
semenko commented 10 years ago

Duplicate hosts with different file names (previously UBS.com.xml vs ubs.xml). These show up as warnings in the validate scripts, but we ignore them because there are so many (!).

We could set a limit on the number of duplicate hosts we allow to prevent this from growing.

e.g. "Failure: Currently 105 rules contain duplicate hosts, hard limit: 100. "

semenko commented 10 years ago

Lots of stylistic things could be standardized.

jsha commented 10 years ago

For duplicate hosts, there was a notion of a dups whiltelist (--ignoredups) in trivial-validate that I broke in my refactoring, in part because it was inadequate and not kept up-to-date.

I think the right fix here is to make dups a hard-fail, but have a file containing a whitelist of allowed duplicate hostnames. In order to pass tests, ruleset maintainers would have to add to the whitelist; whitelist modifications would me subject to extra scrutiny.

semenko commented 10 years ago

From my own commit, negative exclusions that work better as actual rules:

<exclusion pattern="^http://www\.expedia\.com/(?!pub/|p/|.*Checkout)" /> vs <rule from="^https?://(?:www\.)?expedia\.com/(?=pub/|p/|.*Checkout)".....

Perhaps we should warn on (?!) inside exclusions.

jsha commented 10 years ago

Agreed on the ?! exclusions thing, but I would say our tendency should be: only hard fails, no warnings. Otherwise the warnings pile up to the point of uselessness, as we've seen.

Also a note on your example rule: no reason for the ?=, you can just do a ?:.

semenko commented 10 years ago

Target doesn't match rules.

Current CATO ruleset: <target host="cato.com" /> (rules are all cato.org

fuzzyroddis commented 10 years ago

@semenko

rules alternate between (www.)? (capturing) and (?:www.)? non-capturing

Perhaps testing if the capture group is used? I know some sites have bad ssl support on the apex and need (?:www\.) -> www. but others do support it so (www\.) -> $1 would work

fuzzyroddis commented 10 years ago

Another one:

https to http redirects eg. https://www.ahm.com.au will redirect to http://www.ahm.com.au causing an infinite loop.

To clarify, this can only be detected by actually making a request to https://www.ahm.com.au

If a rule such as

<rule from="^http://(?:www\.)?ahm\.com\.au/"
        to="https://www.ahm.com.au/" />

existed, it would cause an infinite loop due to the 301 redirect to http://www.ahm.com.au

fuzzyroddis commented 10 years ago
<rule from="^http://(?:www\.)\.example\.com/"
        to="https://www.example.com/" />

Edit: just noticed a bug in my own example:

from="^http://(?:www\.)\.example\.com/" should be from="^http://(?:www\.)?\.example\.com/"

semenko commented 10 years ago

Adding to target doesn't match rules:

    <rule from="^http://((?:bugs|downloads|...[snip]...|www)\.)?php\.net/"
-       to="https://$1hp.net/" />

.php.net -> .hp.net :-1:

jsha commented 9 years ago

Update: Per #984, we now have ruleset coverage testing that should catch a lot of these "rule doesn't match target host" problems.