apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.68k stars 349 forks source link

How to avoid return of corrupted clean-text after for self closed tags? #566

Closed lxtadhamnawito closed 1 year ago

lxtadhamnawito commented 1 year ago

I'm trying to sanitize a string that is provided as data for xml-configurator, this data is in the form of { key:value} and it's applied among the xml so it can make some xss attacks.

I made some trials and configurations for the package to try it and works fine with some tags, but for others specially when they are self closed ones (<div /> instead of <div></div>) it behaves strange. It starts to remove the "/>" and put &gt and create another closing tag at the end of the object and everything gets messy 😀

Configuration and function I'm using

const SANITIZATION_CONFIGURATION = {

    allowedTags: sanitizeHtml.defaults.allowedTags.concat(['img', 'iframe', 'video', 'embed']),
    allowedSchemesByTag: {
        img: ['data']
    },
    allowedAttributes: {
        'p': ['style'],
        'span': ['style'],
        'li': ['style'],
        'a': ['href', 'style'],
        'img': ['src', 'style'],
        's': ['style'],
        'strong': ['style'],
        'iframe': ['src', 'style', 'width', 'height',],
        'h1': ['style'],
        'h2': ['style'],
        'h3': ['style'],
        'h4': ['style'],
        'table': ['style'],
        'td': ['style'],
        'tr': ['style'],
        'blockquote': ['style'],
        'video': ['src', 'width', 'height', 'controls'],
        'embed': ['src', 'width', 'height']
    },
    disallowedTagsMode: 'escape'}
export const sanitizeText = (input) => {
    return new Promise((resolve) => {
        const cleanText = sanitizeHtml(input, SANITIZATION_CONFIGURATION);
        resolve(cleanText);
    })}

This is the data i'm passing the function sanitizeText

{
"video": "<video src='https://app.heartex.ai/static/samples/opossum_snow.mp4' width=100% muted /><img src onerror=\"$=n=>document.getElementsByTagName(n)[0];a=$('audio');v=$('video');a.onseeked=()=>{v.currentTime=a.currentTime};a.onplay=()=>v.play();a.onpause=()=>v.pause()\" />",
"videoSource": "https://app.heartex.ai/static/samples/opossum_snow.mp4"
}

This is the output i'm getting from sanitizeText

{
"video": "<video src="https://app.heartex.ai/static/samples/opossum_snow.mp4" width="100%"><img src />document.getElementsByTagName(n)[0];a=$('audio');v=$('video');a.onseeked=()=&gt;{v.currentTime=a.currentTime};a.onplay=()=&gt;v.play();a.onpause=()=&gt;v.pause()\" /**&gt**;",
"videoSource": "https://app.heartex.ai/static/samples/opossum_snow.mp4"
}</video> 

Details

Node version : v16.13.2

Operating sys: windows 10

boutell commented 1 year ago

Please clarify how this can be used to create an XSS exploit. It looks like the output includes some { and } but that has no effect when it's just part of text between tags in HTML. I realize you would have liked htmlparser2 to resolve the > in some other way, but quotes around an attribute do not canonically mean that > stops meaning "end of tag" in HTML. Changing the tolerance level for this is something that would have to happen in the htmlparser2 module (if it's even a good idea — I haven't gone deep on this to figure that out), and I don't see any XSS vulnernability thus no bug in sanitize-html.

But if there's a way of getting this to cause the browser to execute arbitrary javascript code that I'm missing, please provide such an example as a PR that adds a new unit test, along with instructions for how to see the exploit happen in the browser. If you're concerned about doing that here, you can send it to security@apostrophecms.com.

lxtadhamnawito commented 1 year ago

It could be used by adding some onnull or onerror with a script running inside them, but I understand that be the configuration provided we allow the available attributes so they won't work if we didn't pass them to the tags we are using "which is pretty good actually" .

in-order to apply this you have to use something like ace-editor or xml editor that will accept these fields and reflect them inside the editor itself.

for sanitize-html, actually it do the job as it should, but my question was about the deformation of the input after it gets sanitized as it's a bit non ordered "why this is happening?"

And I would do my best to make a PR with an example with the editor and the provided data that could run my example above.

Thank you.

lxtadhamnawito commented 1 year ago

@boutell I was looking for selfClosing which made most of the job for me -_-

One more question, why does sanitaization change any attribute from single quotes to double quotes. if you made sanitize("<img src='http://somelink.com' width='100%' />) it will change to <img src="http://somelink.com" width="100%" />

boutell commented 1 year ago

If I recall properly htmlparser2 doesn't tell us how the attribute was originally quoted, only what the value was. It is true that single quotes are valid HTML too but in the absence of a way to know what was preferred we use the most widely used quoting strategy.