Closed GiantCrocodile closed 5 years ago
URL validation is a pretty complex issue, see the comment in our implementation. 😢
Nr. 2 is a known issue (see our unit tests).
I can reproduce Nr. 1 and 3 but I'm wondering why those two happen. We do have unit tests for similar cases that pass... 🤔
It's all about strict vs same ;) *.
I wonder that what is the FILTER_VALIDATE_URL
filter not able to do?
It doesn‘t support internationalized domain names (IDN) for example. If I remember correctly, it also has various other bugs so that it is sometimes too strict and sometimes accepts strings that aren‘t valid URLs.
I wonder if we could make it less strict by doing a three-tier layered validation approach.
1.) use filter_validate_url 2.) if that fails use our regex 3.) if that fails check for localhost addresses
We do something similar in the V::email validator.
What do you think about Laravel's validation method?
@bastianallgeier Hm, but that would break for false-positives.
We could try the Laravel/Symfony one, maybe it‘s better than ours.
I love it how everyone copies implementations from somewhere else. The PHP team should really fix the official filter. 😅
Hm, but that would break for false-positives.
Doesn't any validator break for false-positives? That's true for the Laravel/Symfony one, too, isn't it?
Yes, but because we know that the PHP filter is buggy, we probably shouldn‘t use that one in our validator chain.
You are right. We should skip the filter. I don't mind testing the Symfony regex.
Btw any clue why case 1 and 3 aren't triggered? Is it really related to the current (weak) regex or maybe some other reason?
Probably some other reason, otherwise the tests wouldn't pass. I will do some research when implementing a fix for this issue.
Thank you :) that's why I asked; I didn't want to risk that the regex is fixed but the 2 cases stay unresolved. I appreciate your afford.
First result of my tests with Symfony's validator pattern: their pattern does not accept URLs with unicode characters in host names.
I have run the Symfony RegEx against our existing unit tests with the following results:
Should validate, but fails:
http://foo.com/unicode_(✪)_in_parens
http://foo.bar/?q=Test%20URL-encoded%20stuff
http://उदाहरण.परीक्षा
http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com
Should fail, but validates:
http://-error-.invalid/
http://a.b--c.de/
http://-a.b.co
http://a.b-.co
http://0.0.0.0
http://10.1.1.0
http://10.1.1.255
http://224.1.1.1
http://1.1.1.1.1
http://123.123.123
* http://3628126748
http://.www.foo.bar/
http://www.foo.bar./
http://.www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254
(URLs marked with a *
are ones where I'm not sure why they should fail as those URLs should actually be valid.)
Works (unlike the current RegEx):
http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
And the results with filter_var($value, FILTER_VALIDATE_URL) !== false
for comparison:
Should validate, but fails:
http://✪df.ws/123
http://➡.ws/䨹
http://⌘.ws
http://⌘.ws/
http://foo.com/unicode_(✪)_in_parens
http://☺.damowmow.com/
http://مثال.إختبار
http://例子.测试
http://उदाहरण.परीक्षा
Should fail, but validates:
rdar://1234
h://test
ftps://foo.bar/
http://a.b--c.de/
http://0.0.0.0
http://10.1.1.0
http://10.1.1.255
http://224.1.1.1
http://1.1.1.1.1
http://123.123.123
* http://3628126748
http://www.foo.bar./
* http://10.1.1.1
* http://10.1.1.254
Works (unlike the current RegEx):
http://special---offer.com/
http://localhost/test/
http://localhost:8080/test
Here Laravel's method results based on Kirby unit test urls:
Status | URL | Expect | Result |
---|---|---|---|
SUCCESS | http://www.getkirby.com | true | true |
SUCCESS | http://www.getkirby.com/docs/param:value/?foo=bar/#anchor | true | true |
SUCCESS | https://www.getkirby.de.vu | true | true |
SUCCESS | http://foo.com/blah_blah | true | true |
SUCCESS | http://foo.com/blah_blah/ | true | true |
SUCCESS | http://foo.com/blah_blah_(wikipedia) | true | true |
SUCCESS | http://foo.com/blah_blah_(wikipedia)_(again) | true | true |
SUCCESS | http://www.example.com/wpstyle/?p=364 | true | true |
SUCCESS | https://www.example.com/foo/?bar=baz&inga=42&quux | true | true |
SUCCESS | http://✪df.ws/123 | true | true |
SUCCESS | http://userid:password@example.com:8080 | true | true |
SUCCESS | http://userid:password@example.com:8080/ | true | true |
SUCCESS | http://userid@example.com | true | true |
SUCCESS | http://userid@example.com/ | true | true |
SUCCESS | http://userid@example.com:8080 | true | true |
SUCCESS | http://userid@example.com:8080/ | true | true |
SUCCESS | http://userid:password@example.com | true | true |
SUCCESS | http://userid:password@example.com/ | true | true |
SUCCESS | http://142.42.1.1/ | true | true |
SUCCESS | http://142.42.1.1:8080/ | true | true |
SUCCESS | http://➡.ws/䨹 | true | true |
SUCCESS | http://⌘.ws | true | true |
SUCCESS | http://⌘.ws/ | true | true |
SUCCESS | http://foo.com/blah_(wikipedia)#cite-1 | true | true |
SUCCESS | http://foo.com/blah_(wikipedia)_blah#cite-1 | true | true |
SUCCESS | http://foo.com/unicode_(✪)_in_parens | true | true |
SUCCESS | http://foo.com/(something)?after=parens | true | true |
SUCCESS | http://☺.damowmow.com/ | true | true |
SUCCESS | http://code.google.com/events/#&product=browser | true | true |
SUCCESS | http://j.mp | true | true |
SUCCESS | ftp://foo.bar/baz | true | true |
SUCCESS | http://foo.bar/?q=Test%20URL-encoded%20stuff | true | true |
SUCCESS | http://مثال.إختبار | true | true |
SUCCESS | http://例子.测试 | true | true |
FAIL | http://उदाहरण.परीक्षा | true | false |
FAIL | http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com | true | false |
SUCCESS | http://1337.net | true | true |
SUCCESS | http://a.b-c.de | true | true |
SUCCESS | http://223.255.255.254 | true | true |
SUCCESS | http://special---offer.com/ | true | true |
SUCCESS | http://localhost/test/ | true | true |
SUCCESS | http://localhost:8080/test | true | true |
SUCCESS | foo | false | false |
SUCCESS | http:// | false | false |
SUCCESS | http://. | false | false |
SUCCESS | http://.. | false | false |
SUCCESS | http://../ | false | false |
SUCCESS | http://? | false | false |
SUCCESS | http://?? | false | false |
SUCCESS | http://??/ | false | false |
SUCCESS | http://# | false | false |
SUCCESS | http://## | false | false |
SUCCESS | http://##/ | false | false |
SUCCESS | http://foo.bar?q=Spaces should be encoded | false | false |
SUCCESS | // | false | false |
SUCCESS | //a | false | false |
SUCCESS | ///a | false | false |
SUCCESS | /// | false | false |
SUCCESS | http:///a | false | false |
SUCCESS | foo.com | false | false |
SUCCESS | rdar://1234 | false | false |
SUCCESS | h://test | false | false |
SUCCESS | http:// shouldfail.com | false | false |
SUCCESS | :// should fail | false | false |
SUCCESS | http://foo.bar/foo(bar)baz quux | false | false |
SUCCESS | ftps://foo.bar/ | false | false |
FAIL | http://-error-.invalid/ | false | true |
FAIL | http://a.b--c.de/ | false | true |
FAIL | http://-a.b.co | false | true |
FAIL | http://a.b-.co | false | true |
FAIL | http://0.0.0.0 | false | true |
FAIL | http://10.1.1.0 | false | true |
FAIL | http://10.1.1.255 | false | true |
FAIL | http://224.1.1.1 | false | true |
SUCCESS | http://1.1.1.1.1 | false | false |
SUCCESS | http://123.123.123 | false | false |
SUCCESS | http://3628126748 | false | false |
FAIL | http://.www.foo.bar/ | false | true |
FAIL | http://www.foo.bar./ | false | true |
FAIL | http://.www.foo.bar./ | false | true |
FAIL | http://10.1.1.1 | false | true |
FAIL | http://10.1.1.254 | false | true |
I tried to rewrite (added localhost) our validation myself. What do you think?
'url' => function ($value): bool {
// In search for the perfect regular expression: https://mathiasbynens.be/demo/url-regex
$regex = '_^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:localhost)|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?$_iu';
return preg_match($regex, $value) !== 0;
}
Status | URL | Expect | Result |
---|---|---|---|
SUCCESS | http://www.getkirby.com | true | true |
SUCCESS | http://www.getkirby.com/docs/param:value/?foo=bar/#anchor | true | true |
SUCCESS | https://www.getkirby.de.vu | true | true |
SUCCESS | http://foo.com/blah_blah | true | true |
SUCCESS | http://foo.com/blah_blah/ | true | true |
SUCCESS | http://foo.com/blah_blah_(wikipedia) | true | true |
SUCCESS | http://foo.com/blah_blah_(wikipedia)_(again) | true | true |
SUCCESS | http://www.example.com/wpstyle/?p=364 | true | true |
SUCCESS | https://www.example.com/foo/?bar=baz&inga=42&quux | true | true |
SUCCESS | http://✪df.ws/123 | true | true |
SUCCESS | http://userid:password@example.com:8080 | true | true |
SUCCESS | http://userid:password@example.com:8080/ | true | true |
SUCCESS | http://userid@example.com | true | true |
SUCCESS | http://userid@example.com/ | true | true |
SUCCESS | http://userid@example.com:8080 | true | true |
SUCCESS | http://userid@example.com:8080/ | true | true |
SUCCESS | http://userid:password@example.com | true | true |
SUCCESS | http://userid:password@example.com/ | true | true |
SUCCESS | http://142.42.1.1/ | true | true |
SUCCESS | http://142.42.1.1:8080/ | true | true |
SUCCESS | http://➡.ws/䨹 | true | true |
SUCCESS | http://⌘.ws | true | true |
SUCCESS | http://⌘.ws/ | true | true |
SUCCESS | http://foo.com/blah_(wikipedia)#cite-1 | true | true |
SUCCESS | http://foo.com/blah_(wikipedia)_blah#cite-1 | true | true |
SUCCESS | http://foo.com/unicode_(✪)_in_parens | true | true |
SUCCESS | http://foo.com/(something)?after=parens | true | true |
SUCCESS | http://☺.damowmow.com/ | true | true |
SUCCESS | http://code.google.com/events/#&product=browser | true | true |
SUCCESS | http://j.mp | true | true |
SUCCESS | ftp://foo.bar/baz | true | true |
SUCCESS | http://foo.bar/?q=Test%20URL-encoded%20stuff | true | true |
SUCCESS | http://مثال.إختبار | true | true |
SUCCESS | http://例子.测试 | true | true |
SUCCESS | http://उदाहरण.परीक्षा | true | true |
SUCCESS | http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com | true | true |
SUCCESS | http://1337.net | true | true |
SUCCESS | http://a.b-c.de | true | true |
SUCCESS | http://223.255.255.254 | true | true |
FAIL | http://special---offer.com/ | true | false |
SUCCESS | https://127.0.0.1/kirby/ | true | true |
SUCCESS | http://localhost/test/ | true | true |
SUCCESS | http://localhost:8080/test | true | true |
SUCCESS | https://127.0.0.1/kirby/panel/pages/blog+vvvv | true | true |
SUCCESS | foo | false | false |
SUCCESS | http:// | false | false |
SUCCESS | http://. | false | false |
SUCCESS | http://.. | false | false |
SUCCESS | http://../ | false | false |
SUCCESS | http://? | false | false |
SUCCESS | http://?? | false | false |
SUCCESS | http://??/ | false | false |
SUCCESS | http://# | false | false |
SUCCESS | http://## | false | false |
SUCCESS | http://##/ | false | false |
SUCCESS | http://foo.bar?q=Spaces should be encoded | false | false |
SUCCESS | // | false | false |
SUCCESS | //a | false | false |
SUCCESS | ///a | false | false |
SUCCESS | /// | false | false |
SUCCESS | http:///a | false | false |
SUCCESS | foo.com | false | false |
SUCCESS | rdar://1234 | false | false |
SUCCESS | h://test | false | false |
SUCCESS | http:// shouldfail.com | false | false |
SUCCESS | :// should fail | false | false |
SUCCESS | http://foo.bar/foo(bar)baz quux | false | false |
SUCCESS | ftps://foo.bar/ | false | false |
SUCCESS | http://-error-.invalid/ | false | false |
SUCCESS | http://a.b--c.de/ | false | false |
SUCCESS | http://-a.b.co | false | false |
SUCCESS | http://a.b-.co | false | false |
SUCCESS | http://0.0.0.0 | false | false |
SUCCESS | http://10.1.1.0 | false | false |
SUCCESS | http://10.1.1.255 | false | false |
SUCCESS | http://224.1.1.1 | false | false |
SUCCESS | http://1.1.1.1.1 | false | false |
SUCCESS | http://123.123.123 | false | false |
SUCCESS | http://3628126748 | false | false |
SUCCESS | http://.www.foo.bar/ | false | false |
SUCCESS | http://www.foo.bar./ | false | false |
SUCCESS | http://.www.foo.bar./ | false | false |
SUCCESS | http://10.1.1.1 | false | false |
SUCCESS | http://10.1.1.254 | false | false |
Note: I think, http://special---offer.com/
should be fail because of hyphen restrictions
The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third and fourth character positions and MUST NOT start or end with a "-" (hyphen).
Reference: https://tools.ietf.org/html/rfc5891#section-4.2.3.1
@afbora The restriction with the double hyphens only applies to the hyphens being in the third and fourth character positions, e.g. http://sp---ecialoffer.com
. So the test case should succeed.
Have you tried the two other test cases that have been originally reported by @GiantCrocodile?
@lukasbestle Hmm, this is really hard. I understand the rule is missing.
SUCCESS http://localhost/test/ true true SUCCESS http://localhost:8080/test true true SUCCESS https://127.0.0.1/kirby/panel/pages/blog+vvvv true true
Yes, you can see them in the above results.
Only reported by @GiantCrocodile tests
Status | URL | Expect | Result |
---|---|---|---|
SUCCESS | https://127.0.0.1/kirby/panel/pages/blog+vvvv | true | true |
SUCCESS | https://127.0.0.1:8080/kirby/panel/pages/blog+vvvv | true | true |
SUCCESS | https://localhost/kirby/panel/pages/blog+vvvv | true | true |
SUCCESS | https://localhost:8080/kirby/panel/pages/blog+vvvv | true | true |
SUCCESS | http://www.getkirby.com:8080/kirby/panel/pages/blog+vvvv | true | true |
Awesome! Could you please send a PR with the changes you made (new RegEx, new tests)? It's already better than the current state, so it's no problem that that one strange test case doesn't pass.
How comes that tests like http://10.1.1.255
should be false? I think that HTTP protocol is fine for IPs instead of domains. Is it about a /
at the end of the IP? My browser detects them as valid links too.
@GiantCrocodile 10.1.1.255
is a broadcast address (because of the 255
value in the last position). It's a valid IP address, but cannot be part of a valid URL as it won't point to a specific computer.
✅
I have an issue with this: I'm using latest RC and when I put in an URL like http://localhost/kirby/panel/pages/blog+cdwcwd the URL field is highlighted red (invalid input) but I can save it anyway. Some visual validator is still wrong. @bastianallgeier @afbora
@GiantCrocodile I guess this is vuelidate
library bug: https://github.com/vuelidate/vuelidate/issues/375
And i have no idea how to figure it out.
Might be related to https://github.com/getkirby/kirby/issues/1514 as well.
Yep, it's related. We wanted to be smart with the instant validation, but in this case it's sending mixed messages because the JS validator is different from the PHP validator. I guess it would be best to remove the instant validation entirely.
I guess this is something which need a bigger refactoring @bastianallgeier. Shall I create a new issue about deleting the instant validation or do you keep care of tracking this?
@GiantCrocodile a new issue for the instant validator would be fantastic. I will close this again.
This is the type of local URL that we get through our CI/CD platform. I think it's a valid URL, But it is getting an invalid URL. @afbora or anyone can you check this?
@vishaltripathi189 I've tested on 3.5.5 and 3.6.0-alpha.2 again. I couldn't reproduce the issue. Input highlighted as red but this is related #2270 issue.
Describe the bug The URL validator is partially broken. A field with URL validation in a structure returns false for the provided examples.
Expected behavior These two entries should be valid URLs (with and without httpS):
Kirby Version 3.2.5-rc.2
Console output No errors.