apostrophecms / sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis. Built on htmlparser2 for speed and tolerance
MIT License
3.86k stars 352 forks source link

sanitize-html not acknowledging allowedSchemes options #679

Open asrv4git opened 1 month ago

asrv4git commented 1 month ago

sanitize-html not acknowledging allowedSchemes options

To Reproduce

Step-by-step instructions to reproduce the behavior: Use 2.13.1 version of sanitize-html Run below code

var sanitizeHtml = require("sanitize-html");

const ALLOWED_SCHEMES = ['http', 'https'];

const htmlStr = `\'"><meta http-equiv="refresh" content="0;url=file:///etc/passwd" />`;

const cleanedHTML = sanitizeHtml(htmlStr, {
    allowedAttributes: false,
    allowedTags: false,
    allowVulnerableTags: true,
    allowedSchemes: ALLOWED_SCHEMES,
    allowProtocolRelative: false,
    disallowedTagsMode: 'completelyDiscard',
    allowedSchemesByTag: {
        img: [...ALLOWED_SCHEMES, 'data']
    },
});

console.log(cleanedHTML);

Actual behavior

'"&gt;<meta http-equiv="refresh" content="0;url=file:///etc/passwd" />

Expected behavior

'"&gt;<meta http-equiv="refresh" content="0" />

Describe the bug

Even though I have configured to allow only 'http' and 'https' schemes, 'file' scheme is getting allowed in content="0;url=file:///etc/passwd attribute

Details

Version of Node.js: 18 LTS PLEASE NOTE: Only stable LTS versions (10.x and 12.x) are fully supported but we will do our best with newer versions.

Server Operating System: Linux and yes, Docker is involved?

boutell commented 1 month ago

The "content" attribute of the meta tag, in the presence of an http-equiv="refresh" attribute, doesn't take just a URL, it takes a combination of a timeout, a semicolon and a URL. sanitize-html has no special logic for validating this attribute. It is unlikely that we would add it because it would be quite unusual to allow this attribute because it can be used to redirect the user literally anywhere on the Internet, even if we don't allow "file" — in most cases this would not be desirable or safe behavior.

However, if you choose to allow these attributes, you can sanitize them your own way using the transformTags option. Check out that option in the documentation.

That being said: I also don't see where you allowed the content and http-equiv attributes at all, so I think there could be more going wrong here, but your code was not escaped properly by github so it is hard to say. If you open a "code block" in a github comment using three backticks on one line, paste your code on the following lines, and then do another line with three backticks, you should get a proper code block that escapes your code so I can read it fully.