dcwatson / bbcode

A pure python bbcode parser and formatter.
BSD 2-Clause "Simplified" License
69 stars 17 forks source link

PhpBB BBCode url error #8

Closed frafra closed 10 years ago

frafra commented 11 years ago

Current output (taken directly from PhpBB database posts table):

>>> t="[url=http://www.dlvr.it/:3pzdg5xo]dlvr[/url:3pzdg5xo]"
>>> bbcode.render_html(t)
'<a href="http://www.dlvr.it/:3pzdg5xo">dlvr[/url:3pzdg5xo]</a>'

Expected output:

'<a href="http://www.dlvr.it/">dlvr</a>'

I don't know it this kind of BBCode is valid, but PhpBB uses it like a comment (don't why it do that) and they declare they use BBCode, so maybe it should recognized.

I suppose that bbcode could just strip them.

frafra commented 11 years ago

Possible fix:

import re
t="[url=http://www.dlvr.it/:3pzdg5xo]dlvr[/url:3pzdg5xo]"
t_clean=re.sub(r"(:\w+)(?=])", "", t)
frafra commented 11 years ago

I found another special case, where ":" is present two times:

[list=1:2ru5nwtc]\n[*:2ru5nwtc]Sponsor[/*:m:2ru5nwtc]

In order to clean this, this regex is needed:

re.sub(r"(:\w+)(?=])", "", t)
dcwatson commented 11 years ago

Yuck. Looks like phpBB is cramming a UID into the tags themselves for some reason. Their parsing code is at https://github.com/phpbb/phpbb3/blob/develop/phpBB/includes/bbcode.php. Stripping them out via a regex might be one way to go. Another would be updating the tag parser to recognize them, but I'd rather not go down that path unless this is actually a common practice with an established syntax. I'd like to think on this a little before putting any code in. You can always strip out the UIDs via a regex before passing it to bbcode in the meantime.

frafra commented 11 years ago

This is what I'm doing actually. Thank you :)

dcwatson commented 10 years ago

I think trying to interpret colons in tag options is generally a bad idea. Closing. Thanks for the input, though.