dcwatson / bbcode

A pure python bbcode parser and formatter.
BSD 2-Clause "Simplified" License
68 stars 17 forks source link

Output is html escaped even if `escape_html=False` #30

Closed jpstotz closed 5 years ago

jpstotz commented 5 years ago
import bbcode
parser = bbcode.Parser(escape_html=False)
print (parser.format("A' [b]A'[/b]"));

Expected output: A' <strong>A'</strong> Actual output: A' <strong>A&#39;</strong>

Affected version: bbcode 1.0.32, tested with Python3.6/3.7

dcwatson commented 5 years ago

The escape_html argument to the Parser class applies to markup that is not inside of a tag -- markup inside tags is controlled by the TagOptions.escape_html. If you really don't want to escape HTML inside a tag, you can add your own formatter for those tags.

Out of curiosity, why wouldn't you want HTML to be escaped inside tags? Obviously < and > need to be escaped, but if you'd rather not escape quotes (for instance) you could modify Parser.REPLACE_ESCAPE to exclude quotes.

jpstotz commented 5 years ago

Let me answer your question with a counter question:

Why to you want to limit your library to BBCode to HTML conversion? I want convert BBCode to various formats, plain text, LaTeX, and so an. Therefore the forced HTML escaping is really not helpful. I don't really understand why an option to disable escaping should only work in a limited region.

BTW: In older versions this conversion worked without problems. Therefore IMHO it is a bug. Please note that the examples from my first post are just a minimal examples to demonstrate the bug, this is not the full scenario I use it.

dcwatson commented 5 years ago

I certainly don't want to limit this library to HTML, but the tag formatters installed by default do render to HTML. I think the vast majority of the people using bbcode expect it to render to HTML, so I think this is reasonable. Given that, having the default HTML tag formatters do HTML escaping is a safe default.

That said, instantiating a parser like bbcode.Parser(install_defaults=False, escape_html=False) will give you something that does no HTML escaping or HTML tag rendering, and you can register your own tag formatters for LaTeX, plain text, or whatever else you'd like.