Closed jonasfeige closed 2 years ago
@lukasbestle This issue about HTML Tidy PR on https://github.com/getkirby/kirby/commit/64615c9c22008a98ae79cd06015f76ac4e0bb8b9 commit.
✅
Appreciate the quick fix in Beta 3!
Email links work now but regular links without a protocol still don’t render the href
attribute. Is that intended?
@jonasfeige no, that's not intended. Do you have an example link?
@bastianallgeier Of course.
For example www.tagesschau.de
gives me an anchor link without href
.
Both http://www.tagesschau.de
and https://www.tagesschau.de
give me a proper anchor link including href
.
Ahhh, now I get it. That makes sense. Our new HTML sanitizer throws out invalid URLs as soon as they don't have a protocol. It's maybe a bit too strict so far. I guess we need to find a work around for such scenarios. (cc @lukasbestle)
Basically, any text in the href without a protocol is assumed to be a relative path if there is no / or protocol.
If there is no protocol, it is relative and its implementation seems complex. See @lukasbestle comment: https://github.com/getkirby/kirby/pull/3769#issuecomment-937991372
To me it would seem logical to simply require a protocol, since that is what devs/users are used to from the url
field.
I agree with @jonasfeige. The URL without protocol is invalid anyway as the browser won't go to the page the editor has expected. So I think we should rather validate the entered URL in the URL dialog so it doesn't even let the editor create a link with an invalid destination.
Of course this means that true relative links won't be allowed now. However that's already the case with the url
field elsewhere. And validating relative URLs in the sanitization process is really difficult.
There's one gotcha we need to be aware of: What happens to the content of existing writer
fields? The HTML will stay like it is, but as soon as the field content is ever edited again, Kirby would remove the href
attributes.
IMO the best solution would be to run the URL validation on all href
s in the writer field before the field is saved. Invalid URLs would then prevent the field from being saved, so they can be corrected before they are sanitized away.
You are right. It's not as simple as I thought. We still need to allow real relative paths. Otherwise you can no longer create internal links. But what if we sanitize URLs in the input like this:
if (Url::isAbsolute($url) === false) {
$absoluteUrl = 'http://' . $url;
if (V::url($absoluteUrl) === true) {
$url = $absoluteUrl;
}
}
It looks a bit nasty, but I think it might do the trick. Anything I might have missed? Could a relative path still be mistaken as valid URL as soon as we prepend the protocol?
Hm, wouldn't that change the meaning of the URL entirely? I.e. if I intend to create a relative link to projects/my-project
, I wouldn't want to get https://projects/my-project
(which is a valid URL!).
What if we pass all link destinations through Kirby's url()
helper, which will normalize the URLs to be absolute? Then they are safe and won't get sanitized away (they also wouldn't need to be). Issue: The markup then gets dependent on the base URL of the site and won't be portable anymore. We already have this issue with image URLs as far as I remember, but it's a limitation caused by storing the writer content as HTML instead of as structured data. The best solution in the long term would be to switch to structured data and to build the HTML only on render in the template.
What if we actually check for relative paths?
if (Url::isAbsolute($url) === false && page($url) === null) {
@distantnative Where would this code snippet be placed?
Just alteration of Bastian's suggestion above
Ah, so you mean if a page with the provided URL doesn't exist, the URL is prepended with http://
, but if a page exists, the URL is kept as it is?
Yes. If it's a relative path that actually exists, don't turn it into an absolute path.
On the other hand, why is our HTML sanitizer so strict in the first place? What's the security threat of allowing "nonsense" (e.g. no protocol) in the href
tag? Honestly asking, cause there might be some, I just don't understand them yet. But if there aren't, maybe we could just make it less strict for the href
tag?
Yes. If it's a relative path that actually exists, don't turn it into an absolute path.
Ah, I understand what you mean now. However I see a few issues with this:
On the other hand, why is our HTML sanitizer so strict in the first place? What's the security threat of allowing "nonsense" (e.g. no protocol) in the
href
tag?
To be honest, the URL parsing rules for URLs without a protocol have been too complex for me to think every possibility through. Since Sane
is a security feature, I decided for the "better strict than sorry" approach with the goal to relax the rules later if (and only if) needed. Nothing would be worse than a security feature that can still be attacked. 100 % security does not exist and I'm sure I still forgot edge cases, but at least it's implemented to my best knowledge.
Here we indeed have such a case where a relaxed behavior may be needed. I will see if I can find a way to allow relative URLs while still catching attacks.
✅
With these writer-changes my sites currently are "broken".
None of this works anymore:
/abc/def
, beeing handled by an own plugin herejavascript:cookieconsent.show()
All the values gets removed.
Maybe add an option to NOT sanitize writer-hrefs at all?
@swiegmann Links starting with a slash should actually work as long as your Kirby site is not installed in a subfolder. JavaScript URIs are indeed not supported by default. What we could do is to offer a blueprint option to disable sanitization for a specific writer field all-together. Could you please add this idea to our feedback platform at https://kirby.nolt.io? In this closed issue we cannot track your suggestion and would forget about it. Thanks!
Describe the bug
The
writer
field occasionally does not render anhref
attribute in its links. I believe this happens when the entered link does not contain a protocol. Unfortunately, no warning is given and it seems like this also translates to the newemail
button.To Reproduce
Steps to reproduce the behavior:
writer
field in any blueprint.email
button.href
attribute.* It seems that adding
http://
works but the protocol is removed in thehref
attribute.https://
works AND will be included in thehref
.Expected behavior
url
field) or render thehref
attribute as it was entered.href
attribute should be rendered when using theemail
button.Kirby Version
3.6.0-beta.2