getkirby / kirby

Kirby's core application folder
https://getkirby.com
Other
1.32k stars 168 forks source link

URL validator invalid for IPs and localhost #2130

Closed GiantCrocodile closed 5 years ago

GiantCrocodile commented 5 years ago

Describe the bug The URL validator is partially broken. A field with URL validation in a structure returns false for the provided examples.

Expected behavior These two entries should be valid URLs (with and without httpS):

  1. https://127.0.0.1/kirby/panel/pages/blog+vvvv
  2. https://localhost/kirby/panel/pages/blog+vvvv
  3. URLs with ports specified

Kirby Version 3.2.5-rc.2

Console output No errors.

lukasbestle commented 5 years ago

URL validation is a pretty complex issue, see the comment in our implementation. 😢

Nr. 2 is a known issue (see our unit tests).

I can reproduce Nr. 1 and 3 but I'm wondering why those two happen. We do have unit tests for similar cases that pass... 🤔

GiantCrocodile commented 5 years ago

It's all about strict vs same ;) *.

afbora commented 5 years ago

I wonder that what is the FILTER_VALIDATE_URL filter not able to do?

lukasbestle commented 5 years ago

It doesn‘t support internationalized domain names (IDN) for example. If I remember correctly, it also has various other bugs so that it is sometimes too strict and sometimes accepts strings that aren‘t valid URLs.

bastianallgeier commented 5 years ago

I wonder if we could make it less strict by doing a three-tier layered validation approach.

1.) use filter_validate_url 2.) if that fails use our regex 3.) if that fails check for localhost addresses

We do something similar in the V::email validator.

afbora commented 5 years ago

What do you think about Laravel's validation method?

https://github.com/laravel/framework/blob/master/src/Illuminate/Validation/Concerns/ValidatesAttributes.php#L1624-L1652

lukasbestle commented 5 years ago

@bastianallgeier Hm, but that would break for false-positives.

We could try the Laravel/Symfony one, maybe it‘s better than ours.

lukasbestle commented 5 years ago

I love it how everyone copies implementations from somewhere else. The PHP team should really fix the official filter. 😅

nilshoerrmann commented 5 years ago

Hm, but that would break for false-positives.

Doesn't any validator break for false-positives? That's true for the Laravel/Symfony one, too, isn't it?

lukasbestle commented 5 years ago

Yes, but because we know that the PHP filter is buggy, we probably shouldn‘t use that one in our validator chain.

bastianallgeier commented 5 years ago

You are right. We should skip the filter. I don't mind testing the Symfony regex.

GiantCrocodile commented 5 years ago

Btw any clue why case 1 and 3 aren't triggered? Is it really related to the current (weak) regex or maybe some other reason?

lukasbestle commented 5 years ago

Probably some other reason, otherwise the tests wouldn't pass. I will do some research when implementing a fix for this issue.

GiantCrocodile commented 5 years ago

Thank you :) that's why I asked; I didn't want to risk that the regex is fixed but the 2 cases stay unresolved. I appreciate your afford.

bastianallgeier commented 5 years ago

First result of my tests with Symfony's validator pattern: their pattern does not accept URLs with unicode characters in host names.

lukasbestle commented 5 years ago

I have run the Symfony RegEx against our existing unit tests with the following results:

Should validate, but fails:

http://foo.com/unicode_(✪)_in_parens
http://foo.bar/?q=Test%20URL-encoded%20stuff
http://उदाहरण.परीक्षा
http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com

Should fail, but validates:

  http://-error-.invalid/
  http://a.b--c.de/
  http://-a.b.co
  http://a.b-.co
  http://0.0.0.0
  http://10.1.1.0
  http://10.1.1.255
  http://224.1.1.1
  http://1.1.1.1.1
  http://123.123.123
* http://3628126748
  http://.www.foo.bar/
  http://www.foo.bar./
  http://.www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254

(URLs marked with a * are ones where I'm not sure why they should fail as those URLs should actually be valid.)

Works (unlike the current RegEx):

http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
lukasbestle commented 5 years ago

And the results with filter_var($value, FILTER_VALIDATE_URL) !== false for comparison:

Should validate, but fails:

http://✪df.ws/123
http://➡.ws/䨹
http://⌘.ws
http://⌘.ws/
http://foo.com/unicode_(✪)_in_parens
http://☺.damowmow.com/
http://مثال.إختبار
http://例子.测试
http://उदाहरण.परीक्षा

Should fail, but validates:

  rdar://1234
  h://test
  ftps://foo.bar/
  http://a.b--c.de/
  http://0.0.0.0
  http://10.1.1.0
  http://10.1.1.255
  http://224.1.1.1
  http://1.1.1.1.1
  http://123.123.123
* http://3628126748
  http://www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254

Works (unlike the current RegEx):

http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
afbora commented 5 years ago

Here Laravel's method results based on Kirby unit test urls:

Status URL Expect Result
SUCCESS http://www.getkirby.com true true
SUCCESS http://www.getkirby.com/docs/param:value/?foo=bar/#anchor true true
SUCCESS https://www.getkirby.de.vu true true
SUCCESS http://foo.com/blah_blah true true
SUCCESS http://foo.com/blah_blah/ true true
SUCCESS http://foo.com/blah_blah_(wikipedia) true true
SUCCESS http://foo.com/blah_blah_(wikipedia)_(again) true true
SUCCESS http://www.example.com/wpstyle/?p=364 true true
SUCCESS https://www.example.com/foo/?bar=baz&inga=42&quux true true
SUCCESS http://✪df.ws/123 true true
SUCCESS http://userid:password@example.com:8080 true true
SUCCESS http://userid:password@example.com:8080/ true true
SUCCESS http://userid@example.com true true
SUCCESS http://userid@example.com/ true true
SUCCESS http://userid@example.com:8080 true true
SUCCESS http://userid@example.com:8080/ true true
SUCCESS http://userid:password@example.com true true
SUCCESS http://userid:password@example.com/ true true
SUCCESS http://142.42.1.1/ true true
SUCCESS http://142.42.1.1:8080/ true true
SUCCESS http://➡.ws/䨹 true true
SUCCESS http://⌘.ws true true
SUCCESS http://⌘.ws/ true true
SUCCESS http://foo.com/blah_(wikipedia)#cite-1 true true
SUCCESS http://foo.com/blah_(wikipedia)_blah#cite-1 true true
SUCCESS http://foo.com/unicode_(✪)_in_parens true true
SUCCESS http://foo.com/(something)?after=parens true true
SUCCESS http://☺.damowmow.com/ true true
SUCCESS http://code.google.com/events/#&product=browser true true
SUCCESS http://j.mp true true
SUCCESS ftp://foo.bar/baz true true
SUCCESS http://foo.bar/?q=Test%20URL-encoded%20stuff true true
SUCCESS http://مثال.إختبار true true
SUCCESS http://例子.测试 true true
FAIL http://उदाहरण.परीक्ष true false
FAIL http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com true false
SUCCESS http://1337.net true true
SUCCESS http://a.b-c.de true true
SUCCESS http://223.255.255.254 true true
SUCCESS http://special---offer.com/ true true
SUCCESS http://localhost/test/ true true
SUCCESS http://localhost:8080/test true true
SUCCESS foo false false
SUCCESS http:// false false
SUCCESS http://. false false
SUCCESS http://.. false false
SUCCESS http://../ false false
SUCCESS http://? false false
SUCCESS http://?? false false
SUCCESS http://??/ false false
SUCCESS http://# false false
SUCCESS http://## false false
SUCCESS http://##/ false false
SUCCESS http://foo.bar?q=Spaces should be encoded false false
SUCCESS // false false
SUCCESS //a false false
SUCCESS ///a false false
SUCCESS /// false false
SUCCESS http:///a false false
SUCCESS foo.com false false
SUCCESS rdar://1234 false false
SUCCESS h://test false false
SUCCESS http:// shouldfail.com false false
SUCCESS :// should fail false false
SUCCESS http://foo.bar/foo(bar)baz quux false false
SUCCESS ftps://foo.bar/ false false
FAIL http://-error-.invalid/ false true
FAIL http://a.b--c.de/ false true
FAIL http://-a.b.co false true
FAIL http://a.b-.co false true
FAIL http://0.0.0.0 false true
FAIL http://10.1.1.0 false true
FAIL http://10.1.1.255 false true
FAIL http://224.1.1.1 false true
SUCCESS http://1.1.1.1.1 false false
SUCCESS http://123.123.123 false false
SUCCESS http://3628126748 false false
FAIL http://.www.foo.bar/ false true
FAIL http://www.foo.bar./ false true
FAIL http://.www.foo.bar./ false true
FAIL http://10.1.1.1 false true
FAIL http://10.1.1.254 false true
afbora commented 5 years ago

I tried to rewrite (added localhost) our validation myself. What do you think?

'url' => function ($value): bool {
    // In search for the perfect regular expression: https://mathiasbynens.be/demo/url-regex
    $regex = '_^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:localhost)|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?$_iu';
    return preg_match($regex, $value) !== 0;
}

Results

Status URL Expect Result
SUCCESS http://www.getkirby.com true true
SUCCESS http://www.getkirby.com/docs/param:value/?foo=bar/#anchor true true
SUCCESS https://www.getkirby.de.vu true true
SUCCESS http://foo.com/blah_blah true true
SUCCESS http://foo.com/blah_blah/ true true
SUCCESS http://foo.com/blah_blah_(wikipedia) true true
SUCCESS http://foo.com/blah_blah_(wikipedia)_(again) true true
SUCCESS http://www.example.com/wpstyle/?p=364 true true
SUCCESS https://www.example.com/foo/?bar=baz&inga=42&quux true true
SUCCESS http://✪df.ws/123 true true
SUCCESS http://userid:password@example.com:8080 true true
SUCCESS http://userid:password@example.com:8080/ true true
SUCCESS http://userid@example.com true true
SUCCESS http://userid@example.com/ true true
SUCCESS http://userid@example.com:8080 true true
SUCCESS http://userid@example.com:8080/ true true
SUCCESS http://userid:password@example.com true true
SUCCESS http://userid:password@example.com/ true true
SUCCESS http://142.42.1.1/ true true
SUCCESS http://142.42.1.1:8080/ true true
SUCCESS http://➡.ws/䨹 true true
SUCCESS http://⌘.ws true true
SUCCESS http://⌘.ws/ true true
SUCCESS http://foo.com/blah_(wikipedia)#cite-1 true true
SUCCESS http://foo.com/blah_(wikipedia)_blah#cite-1 true true
SUCCESS http://foo.com/unicode_(✪)_in_parens true true
SUCCESS http://foo.com/(something)?after=parens true true
SUCCESS http://☺.damowmow.com/ true true
SUCCESS http://code.google.com/events/#&product=browser true true
SUCCESS http://j.mp true true
SUCCESS ftp://foo.bar/baz true true
SUCCESS http://foo.bar/?q=Test%20URL-encoded%20stuff true true
SUCCESS http://مثال.إختبار true true
SUCCESS http://例子.测试 true true
SUCCESS http://उदाहरण.परीक्ष true true
SUCCESS http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com true true
SUCCESS http://1337.net true true
SUCCESS http://a.b-c.de true true
SUCCESS http://223.255.255.254 true true
FAIL http://special---offer.com/ true false
SUCCESS https://127.0.0.1/kirby/ true true
SUCCESS http://localhost/test/ true true
SUCCESS http://localhost:8080/test true true
SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true
SUCCESS foo false false
SUCCESS http:// false false
SUCCESS http://. false false
SUCCESS http://.. false false
SUCCESS http://../ false false
SUCCESS http://? false false
SUCCESS http://?? false false
SUCCESS http://??/ false false
SUCCESS http://# false false
SUCCESS http://## false false
SUCCESS http://##/ false false
SUCCESS http://foo.bar?q=Spaces should be encoded false false
SUCCESS // false false
SUCCESS //a false false
SUCCESS ///a false false
SUCCESS /// false false
SUCCESS http:///a false false
SUCCESS foo.com false false
SUCCESS rdar://1234 false false
SUCCESS h://test false false
SUCCESS http:// shouldfail.com false false
SUCCESS :// should fail false false
SUCCESS http://foo.bar/foo(bar)baz quux false false
SUCCESS ftps://foo.bar/ false false
SUCCESS http://-error-.invalid/ false false
SUCCESS http://a.b--c.de/ false false
SUCCESS http://-a.b.co false false
SUCCESS http://a.b-.co false false
SUCCESS http://0.0.0.0 false false
SUCCESS http://10.1.1.0 false false
SUCCESS http://10.1.1.255 false false
SUCCESS http://224.1.1.1 false false
SUCCESS http://1.1.1.1.1 false false
SUCCESS http://123.123.123 false false
SUCCESS http://3628126748 false false
SUCCESS http://.www.foo.bar/ false false
SUCCESS http://www.foo.bar./ false false
SUCCESS http://.www.foo.bar./ false false
SUCCESS http://10.1.1.1 false false
SUCCESS http://10.1.1.254 false false

Note: I think, http://special---offer.com/ should be fail because of hyphen restrictions

The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third and fourth character positions and MUST NOT start or end with a "-" (hyphen).

Reference: https://tools.ietf.org/html/rfc5891#section-4.2.3.1

lukasbestle commented 5 years ago

@afbora The restriction with the double hyphens only applies to the hyphens being in the third and fourth character positions, e.g. http://sp---ecialoffer.com. So the test case should succeed.

Have you tried the two other test cases that have been originally reported by @GiantCrocodile?

afbora commented 5 years ago

@lukasbestle Hmm, this is really hard. I understand the rule is missing.

SUCCESS http://localhost/test/ true true SUCCESS http://localhost:8080/test true true SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true

Yes, you can see them in the above results.

Only reported by @GiantCrocodile tests

Status URL Expect Result
SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true
SUCCESS https://127.0.0.1:8080/kirby/panel/pages/blog+vvvv true true
SUCCESS https://localhost/kirby/panel/pages/blog+vvvv true true
SUCCESS https://localhost:8080/kirby/panel/pages/blog+vvvv true true
SUCCESS http://www.getkirby.com:8080/kirby/panel/pages/blog+vvvv true true
lukasbestle commented 5 years ago

Awesome! Could you please send a PR with the changes you made (new RegEx, new tests)? It's already better than the current state, so it's no problem that that one strange test case doesn't pass.

GiantCrocodile commented 5 years ago

How comes that tests like http://10.1.1.255 should be false? I think that HTTP protocol is fine for IPs instead of domains. Is it about a / at the end of the IP? My browser detects them as valid links too.

lukasbestle commented 5 years ago

@GiantCrocodile 10.1.1.255 is a broadcast address (because of the 255 value in the last position). It's a valid IP address, but cannot be part of a valid URL as it won't point to a specific computer.

bastianallgeier commented 5 years ago

GiantCrocodile commented 5 years ago

I have an issue with this: I'm using latest RC and when I put in an URL like http://localhost/kirby/panel/pages/blog+cdwcwd the URL field is highlighted red (invalid input) but I can save it anyway. Some visual validator is still wrong. @bastianallgeier @afbora

grafik

afbora commented 5 years ago

@GiantCrocodile I guess this is vuelidate library bug: https://github.com/vuelidate/vuelidate/issues/375 And i have no idea how to figure it out.

nilshoerrmann commented 5 years ago

Might be related to https://github.com/getkirby/kirby/issues/1514 as well.

bastianallgeier commented 5 years ago

Yep, it's related. We wanted to be smart with the instant validation, but in this case it's sending mixed messages because the JS validator is different from the PHP validator. I guess it would be best to remove the instant validation entirely.

GiantCrocodile commented 5 years ago

I guess this is something which need a bigger refactoring @bastianallgeier. Shall I create a new issue about deleting the instant validation or do you keep care of tracking this?

bastianallgeier commented 5 years ago

@GiantCrocodile a new issue for the instant validator would be fantastic. I will close this again.

vishaltripathi189 commented 3 years ago

This is the type of local URL that we get through our CI/CD platform. I think it's a valid URL, But it is getting an invalid URL. @afbora or anyone can you check this?

http://its-my-domain.added-domain:8080/my-endpoint

afbora commented 3 years ago

@vishaltripathi189 I've tested on 3.5.5 and 3.6.0-alpha.2 again. I couldn't reproduce the issue. Input highlighted as red but this is related #2270 issue.