Closed frafra closed 10 years ago
Possible fix:
import re
t="[url=http://www.dlvr.it/:3pzdg5xo]dlvr[/url:3pzdg5xo]"
t_clean=re.sub(r"(:\w+)(?=])", "", t)
I found another special case, where ":" is present two times:
[list=1:2ru5nwtc]\n[*:2ru5nwtc]Sponsor[/*:m:2ru5nwtc]
In order to clean this, this regex is needed:
re.sub(r"(:\w+)(?=])", "", t)
Yuck. Looks like phpBB is cramming a UID into the tags themselves for some reason. Their parsing code is at https://github.com/phpbb/phpbb3/blob/develop/phpBB/includes/bbcode.php. Stripping them out via a regex might be one way to go. Another would be updating the tag parser to recognize them, but I'd rather not go down that path unless this is actually a common practice with an established syntax. I'd like to think on this a little before putting any code in. You can always strip out the UIDs via a regex before passing it to bbcode in the meantime.
This is what I'm doing actually. Thank you :)
I think trying to interpret colons in tag options is generally a bad idea. Closing. Thanks for the input, though.
Current output (taken directly from PhpBB database posts table):
Expected output:
I don't know it this kind of BBCode is valid, but PhpBB uses it like a comment (don't why it do that) and they declare they use BBCode, so maybe it should recognized.
I suppose that bbcode could just strip them.