EFForg / https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
https://eff.org/https-everywhere
Other
3.37k stars 1.09k forks source link

Unnecessary overlapping targets in rulesets #12322

Closed RReverser closed 6 years ago

RReverser commented 7 years ago

https://github.com/EFForg/https-everywhere/pull/12320 helped to find a list of rulesets where targets overlap, and so unnecessarily duplicate each other. Reducing number of those might help with #12232:

cc @Hainish @cschanaj @koops76 @Bisaloo

RReverser commented 7 years ago

Updated to group output, it's less noisy this way.

RReverser commented 7 years ago

Not sure if I should attempt to remove these automatically or going manually one by one will be easier...

Bisaloo commented 7 years ago

For some of them, conceptually, it kind of makes sense to be that way. For example, UserEcho.com.xml lists "actual" subdomains first and then the wildcard is used to cover customer subdomains.

What I like about it is that the day custom subdomains support is removed, we are just one line away to fix this ruleset, we don't have to find and check every "regular" subdomain again.

A similar alternative would be:

    <target host="userecho.com"/>
    <target host="*.userecho.com"/>

        <!-- All regular subdomains -->
        <test url="http://www.userecho.com/"/>
        <test url="http://blog.userecho.com/"/>
        <test url="http://feedback.userecho.com/"/>

        <!-- Customer subdomains examples -->
        <test url="http://imgur.userecho.com/"/>
        <test url="http://unchecky.userecho.com/"/>

But I have no idea how big an impact this overlapping targets have on performance.

Bisaloo commented 7 years ago

That's interesting, looking at https://github.com/EFForg/https-everywhere/pull/12320/commits/cdc90c422a7a0addfefa37a744286bef7333227d, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

RReverser commented 7 years ago

Well, I noticed them as part of hunt for wildcard-in-the-middle regexps, and thought that it makes sense to remove overlaps completely as they're somewhat confusing.

On the other hand, I see your point too, even though 1) *.somehost hosts are better for both CPU and memory performance than long lists of precise subdomains (especially with reverse-trie approach) and 2) it's future-proof for cases when domain gets new subdomains that also need http: -> https: rewrite and so I'd rather wish to keep these than long lists of specific targets.

But I know not everyone in HTTPS Everywhere team agrees with this 😄

Anyway, in cases where they overlap, I think using <test url /> makes more sense if it's just for documentation purposes, although by the time custom subdomains are gone, ruleset will likely have to update list of targets anyway as new subdomains might appear or host might get HSTS etc.

RReverser commented 7 years ago

That's interesting, looking at cdc90c4, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

Yup, I got the same impression.

cschanaj commented 7 years ago

But I have no idea how big an impact this overlapping targets have on performance.

AFAIK, the lookup performance will be improved as the complicated operations in rules.js#L347-L354 can be avoid.

That's interesting, looking at cdc90c4, it looks like most of the overlaps are due to the fact that the ruleset author didn't account for the fact that left wildcards affect more than first-level subdomains.

I do not know this until recently when I create #11998 looking at rules.js. Maybe we should document this behavior.

RReverser commented 7 years ago

It is documented at https://www.eff.org/https-everywhere/rulesets like some other details, but yes, looks like that information is out of sync with contributing guidelines in the repo.

RReverser commented 6 years ago

Automated fix was merged, so closing this as resolved.