Open Qendolin opened 2 years ago
This library has a layered structure. Raw @bbob/parser
can be used to transform bbcodes to AST, or @bbob/core
for plugin support.
I don't include whole html5 attributes validation and tags spec cause the main goal of this library is to be tiny and raw so you can use it to create your own solutions.
Anyway library was safe by default
is a good point. I will try to investigate how I can achieve this without bloating this library.
I think the biggest problem is creating HTML as a string. Using the DOM is a lot safer as it prevents any kind of escape attempt.
For my site I've implemented a custom renderNode
like this:
const renderNode = (node, { stripTags = false }) => {
if (!node) return document.createTextNode('');
const type = typeof node;
if (type === 'string' || type === 'number') {
return document.createTextNode(node);
}
if (Array.isArray(node)) {
return renderNodes(node, { stripTags });
}
if (type === 'object') {
if (stripTags === true) {
return renderNodes(node.content, { stripTags });
}
const element = document.createElement(node.tag);
for (const name in node.attrs) {
try {
element.setAttribute(name, node.attrs[name]);
} catch (ignored) {
// Ignore invalid attribute names
}
}
if (node.content === null) {
return element;
}
element.append(...renderNodes(node.content));
return element;
}
return document.createTextNode('');
};
const renderNodes = (nodes, { stripTags = false }) => {
return nodes.flatMap((node) => renderNode(node, { stripTags }));
};
const toHTML = (source, plugins, options) =>
core(plugins).process(source, {
...options,
render: (nodes, opts) => {
// Document fragment is just a container element, so that the function only returns a single element
const frag = document.createDocumentFragment();
frag.append(...renderNodes(nodes, opts));
return frag;
}
}).html;
To make attributes and tag names whitlist-based I've also created my own processor
/**
* @param name the property name
* @param value the property value, will be escaped
* @returns The css style definition
*/
const renderStyle = (name, value) => {
value = value.replaceAll(`"`, ``).replaceAll(`'`, '').replaceAll(`;`, '');
return `${name}: ${value};`;
};
// Contains allowed styles
const styleMap = {
color: (val) => renderStyle('color', val)
}
/**
* @param attrs
* @returns The constructed style attribute
*/
const getStyleFromAttrs = (attrs) => {
let style = '';
for (const name in styleMap) {
if (name in attrs) {
style += styleMap[name](attrs[name]);
}
}
return { style };
};
/**
* @param attrs
* @returns A style with the color of the unique attribute
*/
const getColorStyle = (attrs) => {
return getStyleFromAttrs({
color: getUniqAttr(attrs) ?? 'inherit'
});
};
/**
* @param node
* @param render the render function
* @returns the url from the unique attribute or the node content
*/
const getUrlHref = (node, render) => {
let url = getUniqAttr(node.attrs);
url ??= render(node.content, { stripTags: true })?.textContent ?? '';
return url;
};
/**
* @param node
* @param render the render function
* @returns the url from the node cpntent
*/
const getImgSrc = (node, render) => {
if (typeof node.content == 'string') return node.content;
return render(node.content, { stripTags: true })?.textContent ?? '';
};
/**
* Creates a trusted node.
* Tag must be a trusted tag
* Attrs must only contain trusted attributes
*/
const toNode = (tag, attrs, content) => ({
tag,
attrs,
content,
trusted: true
});
export const definitions = {
b: (node) => toNode('b', {}, node.content),
i: (node) => toNode('i', {}, node.content),
u: (node) => toNode('span', { style: `text-decoration: underline;` }, node.content),
s: (node) => toNode('s', {}, node.content),
br: (node) => toNode('br', {}, node.content),
url: (node, { render }) =>
toNode(
'a',
{
href: getUrlHref(node, render),
// Also an important privacy improvement
rel: 'noreferrer noopener nofollow',
target: '_blank'
},
node.content
),
img: (node, { render }) =>
toNode(
'img',
{
src: getImgSrc(node, render),
// For privacy
referrerpolicy: 'no-referrer',
// For cross origin images
crossorigin: 'anonymous',
width: node.attrs.width,
height: node.attrs.height
},
[]
),
quote: (node) => toNode('blockquote', {}, node.content),
style: (node) => toNode('span', getStyleFromAttrs(node.attrs), node.content),
color: (node) => toNode('span', getColorStyle(node.attrs), node.content),
// ... and more (I left some out for brevity)
};
function process(tags, tree, core, options) {
tree.walk((node) => {
if (isTagNode(node)) {
// Create a trusted node from whitelisted tags
if (tags[node.tag]) {
return tags[node.tag](node, core, options);
}
if (node.trusted === true) {
// trusted nodes can be any tag or contain attributes that are not whitelisted
return node;
}
// Convert all non-whitelisted tags to spans and remove all attributes
return {
tag: 'span',
attrs: {},
content: node.content,
trusted: true
};
}
return node;
});
}
The most important differences are:
<span>
elementinnerHTML
rel
and referrerpolicy
to appropriate values
Script injection is still very easy to achieve. As mentioned in https://github.com/JiLiZART/BBob/issues/66#issuecomment-653926541 you can inject JavaScript via
on*
attributes.[aaa onclick=alert('Hacked')]Click Me[/aaa]
I don't think the solution is to blacklist certain attributes. IMO it shouldn't even be possible to set any attributes unless explicitly enabled in a preset. Even a
style
attribute can be used maliciously.Furthermore it is not save to allow the creation of an arbitrary html tag. For example some websites might use custom elements which themselves might be exploitable.
I'ld strongly prefer it this library was safe by default.