TrianguloY / URLCheck

Android app by TrianguloY: URLCheck
Other
1k stars 58 forks source link

Regex transform #170

Open danielphan2003 opened 1 year ago

danielphan2003 commented 1 year ago

Is your feature request related to a problem? Please describe. When replacing a url with regex, there is some cases where you need to transform them. For example: example.com/HeY ➔ example.com/hey

Describe a solution you've considered None.

Describe alternatives you've considered None.

Additional context None.

TrianguloY commented 1 year ago

Do you mean using a regexp transformation? like '/H(.)Y' -> '/h$1y'? The current patterns checker module does allow for a replacement, there are some included which uses it, and you can find more on the wiki.

Or do you mean something else like changing the characters case (lower/upper)?

danielphan2003 commented 1 year ago

Oh, I meant for the lower/upper case transformation.

TrianguloY commented 1 year ago

Ah, unfortunately java regexp doesn't support changing case. I could add a module with two buttons, or at least one to make the url lowercase. In any case, what is the reasoning for this feature? Aren't urls case-insensitive?

danielphan2003 commented 1 year ago

The scheme (e.g. https vs HTTPS) and the host (e.g. example.com vs EXamPLe.COm) of a URI are case-insensitive, but the rest (paths, queries etc.) are not, at least according to this answer on https://stackoverflow.com/questions/15641694/are-uris-case-insensitive :

RFC 3986 states:

the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI HTTP://www.EXAMPLE.com/ is equivalent to http://www.example.com/. The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme

RFC 2616 defines the following comparison rule for the HTTP scheme:

When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:

However, RFC 7230 locks it down further by stating

The scheme and host are case-insensitive and normally provided in lowercase; all other components are compared in a case-sensitive manner.

Those rules typically apply to client side comparisons. There are no rules specifically geared for server side comparisons. Once a server breaks up a URI into its components, it should treat them according to the same rules, but I don't see that enforced in the RFCs. Some web servers, like Apache, do follow the rules. IIS doesn't, for compatibility with Windows' case-insensitive file system.

TrianguloY commented 1 year ago

So, for that reason, a module to change the url case is not a good idea, since it would break most of them. It could change the left part though, but I'm not sure if that's really useful in any case.

Since you mentioned regexp, is there any specific user case you know that may benefit from this feature? Maybe there is a shortener-like service that makes the long urls with different case and so you need to change it when unshortening it?

Murilogs1910 commented 1 year ago

I was actually searching for a way to change the uppercase characters to lowercase in the domain.

It's useful since I could send you a malicious link like https://googIe.com and it would be hard to notice there's an uppercase i instead of a lowercase l. However, if the domain characters were all lowercase, it would be easier to determine that's a fake website.

It makes even more sense to add this if you consider the pattern checker, by default, checks for non-ascii characters to help recognize websites used for phishing, as explained in the app:

- Warning when contains non-ascii characters like greek letters. This can be used for phishing: googÍe.com vs google.com

TrianguloY commented 1 year ago

That's... a very good reason indeed. According to the standard the domain part of an url is case insensitive, and some apps even translate it to lowercase. It's a shame that java regexp doesn't support case syntax, replacing the domain with a lowercase is a nice addition. I wonder if having it as a special rule in that module could be problematic. A new module just for that seems a bit too much, but with this new information maybe it's worth it.

I'm thinking of a module with two buttons: