JiLiZART / BBob

⚡️Blazing fast js bbcode parser, that transforms and parses bbcode to AST and transform it to HTML, React, Vue with plugin support in pure javascript, no dependencies
https://codepen.io/JiLiZART/full/vzMvpd
MIT License
164 stars 19 forks source link

Flawed getUniqAttr Parsing May Corrupt HTML Output #202

Open mmichaelis opened 1 year ago

mmichaelis commented 1 year ago

https://github.com/JiLiZART/BBob/blob/3575982b280cc45c9cedaf7a059491a324c1b514/packages/bbob-plugin-helper/src/helpers.js#L77-L88

Without understanding the details, the description may benefit from some enhanced description (see below). Given my assumptions and tests are correct, I will refer to a possibly even dangerous flaw in getUniqAttr handling, which can be summarized as: You can fake unique attributes within BBCode.

Suggestion for Description Enhancement

/**
 * Given a record of string to some value, this method will
 * retrieve the last entry in the record and return its key
 * when it is equal to its value.
 *
 * Such entries typically represent so-called _unique attributes_
 * after parsing, so that `[url=someUrl]` gets parsed to an
 * attributes object like: `{ someUrl: "someUrl" }`.
 *
 * @example
 * getUniqAttr({ 'foo': true, 'bar': bar' }) => 'bar'
 * @example
 * getUniqAttr({ 'bar': bar', 'foo': true }) => null
 * @param attrs - record of strings to attribute values
 * @returns {string|null} `null`, if no unique attribute could be determined
 */

The Flaw

BBCode Actual HTML Expected HTML (Suggestion)
[url fakeUnique=fakeUnique]T[/url] <a href="fakeUnique">T</a> <a href="T" fakeUnique="fakeUnique">T</a>
[url=https://example.org/ fakeUnique=fakeUnique]T[/url] <a href="fakeUnique">T</a> <a href="https://example.org/" fakeUnique="fakeUnique">T</a>
[url=https://example.org/ hidden]T[/url] <a href="hidden">T</a> <a href="T" hidden="hidden">T</a>
[url=https://example.org/ hidden]T[/url] <a href="hidden">T</a> <a href="T" hidden="hidden">T</a>
[table=onclick][tr][td]T[/td][/tr][/table] <table onclick="onclick"><tr><td>T</td></tr></table> undecided
[table onclick=onclick][tr][td]T[/td][/tr][/table] <table onclick="onclick"><tr><td>T</td></tr></table> undecided

Stumbled across this while trying to add a sanitizer that forbids on* attributes to be created during processing.

Thus, the attribute found as being "unique" may not always have been "unique" within the original BBCode.

Perhaps one possible option would be using a Symbol() as key for the unique attribute. But I did not dive into parsing, if this is even feasible.

JiLiZART commented 1 year ago

Main goal of getUniqAttr is to handle situations where attribute name is a tag name for bbcodes without any attributes but with attribute value like [url=example.com]T[/url] any other variations of usage are falsy. Maybe it's a bad decidion for handle this as { 'example.com': example.com } maybe better will be handling this as { url: 'example.com' }, but it breaking change.

Missunderstandings will be fixed when I done TypeScript support

mmichaelis commented 1 year ago

Maybe it's a bad decidion for handle this as { 'example.com': example.com } maybe better will be handling this as { url: 'example.com' }, but it breaking change.

Yes, it will be a breaking change (and we will for sure stumble across this when the behavior changes; hopefully by some integration tests that will trigger us on update).

Introducing Symbols (which may be the best option here regarding non-conflicting behavior), an alternative representation may be some characters, that will never make it to an attribute name (at least when parsing BBCode). A space character may be the best option here, or a closing bracket:

To smoothen updates, all API clients should be advised to only access unique attributes via dedicated API instead of applying their own guesses on the behavior. Possibly even a challenge for us, as we try to prevent any on* handlers to sneak into the attributes.

mmichaelis commented 1 year ago

This issue becomes more severe, when you want to have tags, that only handle "unique attributes" differently, but keep all others as is. Thus, if you want to stick to the default mapping, for all not explicitly mapped tags/elements.

Example input without extra mapping configured:

[unknown=unique fakeUnique=fakeUnique]T[/unknown]

will become:

<unknown unique="unique" fakeUnique="fakeUnique">T</unknown>

which, in case of providing an alternative mapping for [url] where you "keep all other but unique attributes" will trigger the following mapping:

[url=https://example.org/ fakeUnique=fakeUnique]T[/url]

to

<a https://example.org/=https://example.org/ href="fakeUnique">T</a>

Follow-Up Options:

The following follow-up options may influence a possible solution to this issue.

Option 1) As soon as this issue is fixed, I would suggest adapting attrsToString to even ignore unique attributes, which are left after processing:

https://github.com/JiLiZART/BBob/blob/3575982b280cc45c9cedaf7a059491a324c1b514/packages/bbob-plugin-helper/src/helpers.js#L64

Thus, the [unknown] example above, should rather be mapped to:

<unknown fakeUnique="fakeUnique">T</unknown>

Doing so now, without fixing this issue, may cause other subsequent bugs and breaking behavior.

Option 2) Or perhaps a very different approach, where you may leave attrsToString untouched: Choose a unique attribute representation that uses as "namespace" prefix. Example: bbob:unique. Without explicit mapping, the [unknown] example would then render to:

<unknown fakeUnique="fakeUnique" bbob:unique="unique">T</unknown>

Option 3) Or, not to use namespaces, but data-attributes instead, which then again represents a valid HTML attribute:

<unknown fakeUnique="fakeUnique" data-bbob-unique="unique">T</unknown>

(accessible in JavaScript via el.dataset.bbobUnique).