Parser has Issues with Blanks and Quotes in Unique Attributes

mmichaelis commented 11 months ago

While possible a bad example, the following will produce corrupted data in AST:

[url=javascript:alert('XSS ME');]TEXT[/url]
[url=javascript:alert("XSS ME");]TEXT[/url]

This can already be seen in the HTML Render demo, that outputs the (processed) AST tree as:

[
  {"tag":"a","attrs":{"href":"ME');"},"content":["TEXT"]},
  "\n",
  {"tag":"a","attrs":{"href":");"},"content":["TEXT"]},"\n"
]

If debugging the parsed tree prior to processing, the first line is represented as:

[
  {
    "tag": "url",
    "attrs": {
      "javascript:alert('XSS": "javascript:alert('XSS",
      "ME');": "ME');"
    },
    "content": [
      "TEXT"
    ]
  }
]

the second one as:

[
  {
    "tag": "url",
    "attrs": {
      "javascript:alert(\"XSS ME": "javascript:alert(\"XSS ME",
      ");": ");"
    },
    "content": [
      "TEXT"
    ]
  }
]

This is possible a similar issue to: #194.

While it may be argued, if the BBCode should not have used, for example, quotes for the first example (which works as expected):

[url="javascript:alert('XSS ME');"]TEXT[/url]

The pain point is, that typically BBCode origins from manually written markup. Thus, more fault-tolerance would be highly appreciated.

Alternative Challenges

[quote=J. D.]T[/quote]
[quote=J. "The T" D.]T[/quote]

JiLiZART commented 11 months ago

Pretty expected behavior, it's hard for parser to understand where ends one attribute value and start new one. So parser assumes that all attributes delimited by space. For single attribute tags like [url=someurl] I think need to ignore any symbols except ending tag ] and this behavior I think can solve another issue with [url=https://example.org/ fakeUnique=fakeUnique]T[/url], that will be transformed to <a href="https://example.org/ fakeUnique=fakeUnique">....

mmichaelis commented 11 months ago

For single attribute tags like [url=someurl] I think need to ignore any symbols except ending tag ].

Agreed. The question is, where to provide this information.

My gut feeling: It should be made available as an additional option to "unique arguments", so that tag mappings may decide, if they want to support, for example, [url=someurl target=_blank] (an option, BBob supports, currently), or if they prefer supporting blanks within the unique attribute.

This may come in combination with a different representation for "unique attributes" (#202). Perhaps even "in parallel" to existing tree data/TagNode representation, so that current usages do not break.

Suggested parsed structure in tree:

[
  {
    "tag": "url",
    "attrs": {
      // unchanged to stay compatible
      "javascript:alert('XSS": "javascript:alert('XSS",
      "ME');": "ME');"
    },
    // Suggestion for this issue.
    "attrString": "javascript:alert('XSS ME');",
    // Suggestion for #202; requires a decision though, what is the
    // best value here. I would vote for enforcing "value right behind the tag's "=",
    // which, for the given example, would evaluate then to: "javascript:alert('XSS", instead.
    "uniqAttr": "ME');",
    "content": [
      "TEXT"
    ]
  }
]

Regarding the existing API, it may be nicer to put these "extra attributes" into the attrs object, but it raises the probability to be breaking towards current usages of BBob.

Eventually, a decision has to be made for the existing presets, e.g., if HTML5 Preset switches to using attrString for the URL rather than the unique attribute.

JiLiZART / BBob

Parser has Issues with Blanks and Quotes in Unique Attributes #204