dcwatson / bbcode

A pure python bbcode parser and formatter.
BSD 2-Clause "Simplified" License
68 stars 17 forks source link

Image tag unsupported #2

Closed zauberparacelsus closed 11 years ago

zauberparacelsus commented 11 years ago

The parser does not appear to support the [img] bbcode tag. Like the other issue I reported, I'll need to code around it.

dcwatson commented 11 years ago

It should be pretty simple to add your own [img] tag formatter, something like this:

def render_img(name, value, options, parent, context):
    opts = ''.join(['%s="%s" ' % (k, v) for k, v in options.items()])
    return '<img src="%s" %s/>' % (value, opts)
parser = bbcode.Parser()
parser.add_formatter('img', render_img)

[img] tags were not included in the default formatters, since they are a security concern for some applications. Loading external resources via img tags is not always a good idea without some sanitizing of the inputs, which is a little out of scope for this library.

interlab commented 11 years ago

sorry my english

I think that it is necessary to check extensions at images (png, jpeg, jpg, gif, ...)

[img]http://evil.org/hack.js[/img]

interlab commented 11 years ago
def check_img_ext(val):
    if type(val) is str:
        return val.endswith(('.png', '.jpg', '.gif'))
    return False

def render_img(name, value, options, parent, context):
    if not check_img_ext(value):
        return value
    opts = ''.join(['%s="%s" ' % (k, v) for k, v in options.items()])
    return '<img src="%s" %s/>' % (value, opts)

parser = bbcode.Parser()
parser.add_formatter('img', render_img)
dcwatson commented 11 years ago

That's probably a good idea, but I'm not sure it's fool-proof. Nothing is stopping you from naming a javascript file with ".gif" and serving it as text/javascript. Again, I intentionally left the [img] tag out of bbcode "core" for this reason, so you can sanitize the input however you see fit. If someone wants to write a "hardened" [img] tag implementation, it would be a good addition to the wiki.

interlab commented 11 years ago

ok, sorry, I think that hastened about the file format - after all it is possible to load images and by means of php (example: http://blablabla.com/img/avatars/thumb_123.php)

I found vulnerability(options):

[img width="64" heigth="36" xss="" /><script>alert('xss')</script>"]http://tiraspol.me/files/Smileys/default/dance.gif[/img]
dcwatson commented 11 years ago

Again, [img] is not part of this library, so I'm not sure what you mean when you say you've found a vulnerability with it. If it's something in the way I parse options generally, feel free to submit a pull request with a failing test case. Otherwise, any "vulnerability" with an [img] tag is in your code.

homeworkprod commented 10 years ago

While I agree with you that [img] should not be parsed by default for security reasons, it should be not too hard to implement it.

However, the automatic link replacement spoiled the fun.

Using a simple formatter:

parser.add_simple_formatter('img', '<img src="%(value)s"/>')

Input:

[img]http://example.com/image.png[/img]

The resulting HTML:

<img src="<a href="http://example.com/image.png">http://example.com/image.png</a>"/>

The result is correct when disabling automatic link replacement by setting the corresponding keyword (replace_links=False) on the parser. But I'd like to have both replacements available (as my target audience is used to that from the previous software).

I remember trouble like this from extending markdown parsers. It might be possible that switching the order in which the replacements are applied would solve the issue.

Then again, the bbcode implementation registers the tag replacements on a dictionary (the field recognized_tags) which has no order. Retrofitting it to respect order (e.g. using a list of pairs, or the OrderedDict which is available as of Python 2.7 and 3.1 as well as a stand-alone library) is not that hard, though.

dcwatson commented 10 years ago

I'm a bit lost - what do you mean when you say you'd like "both replacements available"? How is that img tag with replace_links=False not meeting your needs? And I'm not sure what changing recognized_tags to respect order will buy you, that dictionary is never actually enumerated.

homeworkprod commented 10 years ago

I'm a bit lost - what do you mean when you say you'd like "both replacements available"? How is that img tag with replace_links=False not meeting your needs? I mean I'd like to have generally both image tags replaced as well as plain URLs linked automatically (i. e. using replace_links=True), though not the same time¹: In case a URL is enclosed in [img]…[/img], it should be turned into an HTML image tag, but if it is not, it should be transformed to an HTML anchor.

¹) In that case, one would have to use [url=http://example.com/][img]http://example.com/image.png[/img][/url] as usual.

And I'm not sure what changing recognized_tags to respect order will buy you, that dictionary is never actually enumerated. Indeed, I didn't recognize that at first.

I just discovered that replace_links can also be set for a specific formatter. I was referring to setting it on the parser itself. The former is likely to solve my issue, I assume. Will try it out soon.

homeworkprod commented 10 years ago

OK, using the keyword on the formatter itself does work for me.

For the record:

import bbcode

def render_html(value):
    """Render text as HTML, interpreting BBcode."""
    parser = bbcode.Parser()

    # Replace image tags.
    def render_image(name, value, options, parent, context):
        return '<img src="{}"/>'.format(value)
    parser.add_formatter('img', render_image, replace_links=False)

    return parser.format(value)

Thanks for your response and your work on the library!

dzpt commented 3 years ago

@homeworkprod it won't work with url has double dash.

https://domain/suzuki-swace-2020-3--1600832941-width1004height565-auto-crop.jpg -> https://domain/suzuki-swace-2020-3&ndash;1600832941-width1004height565-auto-crop.jpg

dzpt commented 3 years ago

Its needed to have replace_cosmetic=False to prevent conversion -- to &ndash;

homeworkprod commented 3 years ago

@dzpt, that is correct. That's why I have disabled it for image tags and code blocks, but also in general (probably because I'd rather give up on nicer typography in "user-generated content" in favor of avoiding unexpected surprises).

I also have tests in place.

Thanks for mentioning that specific example, though; I've added a case that specifically checks that URLs are not changed regarding cosmetic replacements.