OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.31k stars 2.36k forks source link

Sanitization is causing some shortcodes to not render correctly in the MarkdownBody #16226

Open DrewBrasher opened 2 months ago

DrewBrasher commented 2 months ago

I have a module that adds a shortcode. The short code creates a span with a data- attribute but the attribute does not get rendered in the MarkdownBody if the Sanitize HTML checkbox is checked in the part settings.

You can reproducer the issue by using the Blog Template and my Content Warning Module (https://github.com/DrewBrasher/OrchardCoreModules).

The HtmlBody renders the shortcode correctly even with Sanitization turned on so I think the MarkdownBody should do the same.

This was discussed some in discussion https://github.com/OrchardCMS/OrchardCore/discussions/13643#discussioncomment-9433727

MikeAlhayek commented 2 months ago

Alternative approach is to add exclusion rule so that your custom data is not removed with something like this

services.Configure<HtmlSanitizerOptions>(o =>
{
    o.Configure.Add(new System.Action<Ganss.Xss.HtmlSanitizer>(o =>
    {
        o.AllowedAttributes.Add("data-custom...");
    }));
});
DrewBrasher commented 2 months ago

Thanks @MikeAlhayek, I've added that and it works.

Piedone commented 2 months ago

The basic question is, should shortcode outputs run through sanitization? I don't think so, but since they inject something into the HTML output of the part/field, it might be hard to try to keep them separate.

DrewBrasher commented 2 months ago

I see several places in the source where it looks like the sanitization could just be moved to before the shortcode processing. Here is one example: https://github.com/OrchardCMS/OrchardCore/blob/7196b5b1c6c0180b17bb4dabe944232001b0a681/src/OrchardCore.Modules/OrchardCore.Markdown/Drivers/MarkdownBodyPartDisplayDriver.cs#L103-L113

But there may be other things I don't know about that would need to be considered.

Piedone commented 2 months ago

Also something to keep in mind is that shortcode inputs can be used for HTML/script injection. So, they themselves should be sanitized, but actually the shortcode output as well, since you can build an offending code by combining the shortcode with something that in itself isn't. Hmm, maybe we should keep this as it is.

DrewBrasher commented 2 months ago

That makes sense to me. The solution @MikeAlhayek provided solved the issue I was having.

MikeAlhayek commented 2 months ago

I think it's safe to evaluate short-codes after sanitizing. The idea behind sanitizing it to ensure that the use does not add malicious code. But with shortcodes, we provide HTML that is kind of trusted.

Piedone commented 2 months ago

The problem is that sanitizing shortcode inputs doesn't guarantee that the output won't be malicious (well, sanitization is not bulletproof anyway, but still). There are all kinds of creative tricks for XSS, including tricking shortcode like this.

sebastienros commented 2 months ago

The HtmlBody renders the shortcode correctly even with Sanitization turned on so I think the MarkdownBody should do the same.

It would be interesting to understand why.

sebastienros commented 2 months ago

@deanmarcussen can you refresh our memory about when we can or cannot use shortcodes and what to expect wrt sanitization? Then we'll write it in the docs so we don't forget.

github-actions[bot] commented 2 months ago

We triaged this issue and set the milestone according to the priority we think is appropriate (see the docs on how we triage and prioritize issues).

This indicates when the core team may start working on it. However, if you'd like to contribute, we'd warmly welcome you to do that anytime. See our guide on contributions here.

DrewBrasher commented 2 months ago

It would be interesting to understand why.

It looks like to me that for the HtmlBody, the sanitization happens on update while the shortcode is still in shortcode notation. https://github.com/OrchardCMS/OrchardCore/blob/198c566313593a1c0f42ed58b64184daf8387d82/src/OrchardCore.Modules/OrchardCore.Html/Drivers/HtmlBodyPartDisplayDriver.cs#L76 Instead of after the shortcode is rendered to hrml.

deanmarcussen commented 2 months ago

so probably the point here, is that by default Markdown does not support adding extra html, through shortcodes, whereas an Html editor, specifically does.

the idea being that Markdown is/was considered safe by not allowing any extra html in it, whereas Html is considered questionable, as you can write html in it, so we just santize essentially, that you didn't add a script to it.

You can't sanitize shortcodes themselves when they are rendered, you need to santize the total output of the shortcodes, in the medium it is being used, because otherwise you could write split shortcodes, which when the output is combined would generate bad html.

@MikeAlhayek 's suggested approach here is the appropriate one I believe.