EFForg / https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
https://eff.org/https-everywhere
Other
3.36k stars 1.09k forks source link

Non-standard wildcard in Live.xml #2981

Closed RReverser closed 8 years ago

RReverser commented 9 years ago

Live.xml is the only ruleset that contains wildcard as following:

<target host="secure.*.live.com" />

From my understanding, such wildcard can be considered incorrect given that documentation explicitly states:

To cover all of a domain's subdomains, you may want to specify a wildcard target like *.twitter.com. Specifying this type of left-side wildcard matches any host name with .twitter.com as a suffix, e.g. www.twitter.com or urls.api.twitter.com. You can also specify a right-side wildcard like www.google.*. Right-side wildcards, unlike left-side wildcards, apply only one level deep. So if you want to cover all countries you'll generally need to specify www.google.*, www.google.co.*, and www.google.com.* to cover domains like www.google.co.uk or www.google.com.au. You should use wildcard targets only when you have rules that apply to the entire wildcard space. If your rules only apply to specific hosts, you should list each host as a separate target.

Such wildcard is not neither left-side nor right-side, and it's unclear how many domain levels it might contain when placed in between. Due to such ambiguity, I believe we should either document the behavior explicitly or just replace this single rule with broader left-side one.

RReverser commented 8 years ago

@fuglede Could you please take a look at this one as well? It's the only test that is left failing for me in custom rewriter.

fuglede commented 8 years ago

This isn't really anything I've looked at before, but I had a quick look around.

The relevant code (for Firefox) is this piece here; it covers only one level. There's probably a good reason why it was made like this, but yeah, to me it also seems a bit excessive if it's only used in one place, and if in that one place it's only used to cover one extra domain, as it would seem from the test URLs.

In any case, since the functionality is there and since it does something non-trivial, I think it should just be properly documented.

RReverser commented 8 years ago

Yeah, it's just one domain in entire codeset (~7.5K URLs), so I hoped it might be simply rewritten to the same format as others instead of changing documentation just for one case (such advanced pattern as secure.*.live.com can very well live in regular expression IMO while domain will be just *.live.com).

fuglede commented 8 years ago

I don't disagree. In the particular case, it would seem that secure.*.live.com could just be replaced by secure.shared.live.com throughout. But if that's the case, it would have probably been done like that in the first place. You could run it by its author.

RReverser commented 8 years ago

Yeah, looks so from the comment - still wondering why such form was chosen then - just for precaution? /cc @2d1

RReverser commented 8 years ago

Still need to do something about this one :( Would you accept a PR that replaces rule with secure.shared.live.com as you suggested? @fuglede

fuglede commented 8 years ago

Yeah, let's just do that.

RReverser commented 8 years ago

That one was fixed but new one appeared :( https://github.com/EFForg/https-everywhere/commit/9b29cd9c#diff-bff2e40dfbb4f6248731e1f12b55f792R89

Looks like the commit itself is old, but was merged only recently.

fuglede commented 8 years ago

Did you look into the necessity of that one? I didn't, but judging by the exclusion pattern, it might be good. In any case a bit weird/suboptimal that our test coverage checker didn't complain about that one.

jsha commented 8 years ago

Agreed we want to remove internal * in target hosts, and remove it from support in the rewriter.