getkirby / kirby

Kirby's core application folder
https://getkirby.com
Other
1.27k stars 167 forks source link

Writer field: no href attribute rendered when link does not contain protocol #3754

Closed jonasfeige closed 2 years ago

jonasfeige commented 2 years ago

Describe the bug
The writer field occasionally does not render an href attribute in its links. I believe this happens when the entered link does not contain a protocol. Unfortunately, no warning is given and it seems like this also translates to the new email button.

To Reproduce
Steps to reproduce the behavior:

  1. Setup a writer field in any blueprint.
  2. Add a link without a protocol* OR use the new email button.
  3. Render the link in a template.
  4. The link will not contain an href attribute.

* It seems that adding http://works but the protocol is removed in the href attribute. https:// works AND will be included in the href.

Expected behavior

  1. The writer field should either not let the user submit a link without protocol (like the url field) or render the href attribute as it was entered.
  2. The href attribute should be rendered when using the email button.

Kirby Version
3.6.0-beta.2

afbora commented 2 years ago

@lukasbestle This issue about HTML Tidy PR on https://github.com/getkirby/kirby/commit/64615c9c22008a98ae79cd06015f76ac4e0bb8b9 commit.

bastianallgeier commented 2 years ago

jonasfeige commented 2 years ago

Appreciate the quick fix in Beta 3! Email links work now but regular links without a protocol still don’t render the href attribute. Is that intended?

bastianallgeier commented 2 years ago

@jonasfeige no, that's not intended. Do you have an example link?

jonasfeige commented 2 years ago

@bastianallgeier Of course. For example www.tagesschau.de gives me an anchor link without href. CleanShot 2021-10-11 at 10 45 55@2x

Both http://www.tagesschau.de and https://www.tagesschau.de give me a proper anchor link including href. CleanShot 2021-10-11 at 10 46 18@2x

bastianallgeier commented 2 years ago

Ahhh, now I get it. That makes sense. Our new HTML sanitizer throws out invalid URLs as soon as they don't have a protocol. It's maybe a bit too strict so far. I guess we need to find a work around for such scenarios. (cc @lukasbestle)

afbora commented 2 years ago

Basically, any text in the href without a protocol is assumed to be a relative path if there is no / or protocol.

If there is no protocol, it is relative and its implementation seems complex. See @lukasbestle comment: https://github.com/getkirby/kirby/pull/3769#issuecomment-937991372

jonasfeige commented 2 years ago

To me it would seem logical to simply require a protocol, since that is what devs/users are used to from the url field.

lukasbestle commented 2 years ago

I agree with @jonasfeige. The URL without protocol is invalid anyway as the browser won't go to the page the editor has expected. So I think we should rather validate the entered URL in the URL dialog so it doesn't even let the editor create a link with an invalid destination.

Of course this means that true relative links won't be allowed now. However that's already the case with the url field elsewhere. And validating relative URLs in the sanitization process is really difficult.

lukasbestle commented 2 years ago

There's one gotcha we need to be aware of: What happens to the content of existing writer fields? The HTML will stay like it is, but as soon as the field content is ever edited again, Kirby would remove the href attributes.

IMO the best solution would be to run the URL validation on all hrefs in the writer field before the field is saved. Invalid URLs would then prevent the field from being saved, so they can be corrected before they are sanitized away.

bastianallgeier commented 2 years ago

You are right. It's not as simple as I thought. We still need to allow real relative paths. Otherwise you can no longer create internal links. But what if we sanitize URLs in the input like this:

if (Url::isAbsolute($url) === false) {
  $absoluteUrl = 'http://' . $url;

  if (V::url($absoluteUrl) === true) {
    $url = $absoluteUrl;
  }
}

It looks a bit nasty, but I think it might do the trick. Anything I might have missed? Could a relative path still be mistaken as valid URL as soon as we prepend the protocol?

lukasbestle commented 2 years ago

Hm, wouldn't that change the meaning of the URL entirely? I.e. if I intend to create a relative link to projects/my-project, I wouldn't want to get https://projects/my-project (which is a valid URL!).

What if we pass all link destinations through Kirby's url() helper, which will normalize the URLs to be absolute? Then they are safe and won't get sanitized away (they also wouldn't need to be). Issue: The markup then gets dependent on the base URL of the site and won't be portable anymore. We already have this issue with image URLs as far as I remember, but it's a limitation caused by storing the writer content as HTML instead of as structured data. The best solution in the long term would be to switch to structured data and to build the HTML only on render in the template.

distantnative commented 2 years ago

What if we actually check for relative paths?

if (Url::isAbsolute($url) === false && page($url) === null) {
lukasbestle commented 2 years ago

@distantnative Where would this code snippet be placed?

distantnative commented 2 years ago

Just alteration of Bastian's suggestion above

lukasbestle commented 2 years ago

Ah, so you mean if a page with the provided URL doesn't exist, the URL is prepended with http://, but if a page exists, the URL is kept as it is?

distantnative commented 2 years ago

Yes. If it's a relative path that actually exists, don't turn it into an absolute path.

distantnative commented 2 years ago

On the other hand, why is our HTML sanitizer so strict in the first place? What's the security threat of allowing "nonsense" (e.g. no protocol) in the href tag? Honestly asking, cause there might be some, I just don't understand them yet. But if there aren't, maybe we could just make it less strict for the href tag?

lukasbestle commented 2 years ago

Yes. If it's a relative path that actually exists, don't turn it into an absolute path.

Ah, I understand what you mean now. However I see a few issues with this:

On the other hand, why is our HTML sanitizer so strict in the first place? What's the security threat of allowing "nonsense" (e.g. no protocol) in the href tag?

To be honest, the URL parsing rules for URLs without a protocol have been too complex for me to think every possibility through. Since Sane is a security feature, I decided for the "better strict than sorry" approach with the goal to relax the rules later if (and only if) needed. Nothing would be worse than a security feature that can still be attacked. 100 % security does not exist and I'm sure I still forgot edge cases, but at least it's implemented to my best knowledge.

Here we indeed have such a case where a relaxed behavior may be needed. I will see if I can find a way to allow relative URLs while still catching attacks.

bastianallgeier commented 2 years ago

swiegmann commented 2 years ago

With these writer-changes my sites currently are "broken".

None of this works anymore:

All the values gets removed.

Maybe add an option to NOT sanitize writer-hrefs at all?

lukasbestle commented 2 years ago

@swiegmann Links starting with a slash should actually work as long as your Kirby site is not installed in a subfolder. JavaScript URIs are indeed not supported by default. What we could do is to offer a blueprint option to disable sanitization for a specific writer field all-together. Could you please add this idea to our feedback platform at https://kirby.nolt.io? In this closed issue we cannot track your suggestion and would forget about it. Thanks!