EFForg / https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
https://eff.org/https-everywhere
Other
3.36k stars 1.09k forks source link

Regexps with suspicious /+ or /* in the codebase #12295

Closed RReverser closed 6 years ago

RReverser commented 7 years ago

These were found as part of https://github.com/EFForg/https-everywhere/pull/12294 (although given that it checked only part of rulesets, there might be more). By "suspicious" I mean that in these cases author probably meant /.* or just / - multiple slashes in a row are unlikely to be the target unless it's some exotic case.

ghost commented 7 years ago

/+ is intentional, multiple slashes should be treated as one slash. /* is a misspelling of /+.

RReverser commented 7 years ago

Disregard previous comment Why is that needed? (generally curious) How often URLs actually have several slashes after a host that need to be squashed when transforming?

ghost commented 7 years ago

@RReverser Just explaining that /* is an error, but /+ isn't. Do you want me to make a PR to change all instances of /* to /+ or are you going to do that yourself?

ghost commented 7 years ago

@RReverser Actually I can't see any case where /* is used incorrectly here, since in all cases I see it follows a /.

ghost commented 7 years ago

@RReverser Close I guess, unless you can find any place where /* does not follow a slash, allowing URLs like https://example.comexample/.

RReverser commented 7 years ago

I didn't say it's an error, I just said it's questionable because I don't see why these multi-slashes are required ro be matched. If there is no real need for that, many of these rules could be trivialized by the same PR. Hence the question on whether these rules are actually required or they meant to be something else or I can safely trivialize them where possible.

RReverser commented 7 years ago

@bisaloo I see you already started doing that for some rulesets manually, let me know if you want me to run trivializer script with an assumption that /+ can match just /.

Bisaloo commented 7 years ago

I don't really know.

I have seen domains doing strange stuff with multiple slashes. As far as I'm concerned, it's totally possible those /+ and /* (or at least some of them) are here for a reason.

I will let you know if I find an example where they are actually needed.

RReverser commented 7 years ago

@Bisaloo Thank you, that would be helpful!

Bisaloo commented 6 years ago

@RReverser, an example of a double slash in the path:

http://www.imdb.com//images/b.gif

which is loaded on any movie page from IMDb (for example http://www.imdb.com/title/tt0120737/)

In this case, we can still write a rule without /+ or /* because this path is valid with both https and http but as I said before, I would not be surprised if some websites had an unexpected behaviour with multiple slashes.

RReverser commented 6 years ago

@Bisaloo @Hainish Is the decision here that this is not an issue, then? Should I close it?

Hainish commented 6 years ago

I don't necessarily view multiple /'s as suspicious, for the above reasons. Since it is the servers that determine how to respond to multiple /'s, it wouldn't surprise me if there is strange behavior out there from misconfigured or obscure web servers. I'm inclined to keep these. Closing.