FriendsOfFlarum / upload

The file upload extension with insane intelligence for your Flarum forum.
https://discuss.flarum.org/d/4154
MIT License
177 stars 96 forks source link

Unicode in captions breaks tags #174

Open jtojnar opened 5 years ago

jtojnar commented 5 years ago

When I change the image caption to something containing non-ASCII characters, the image will no longer render, displaying the code instead.

Example markup:

[upl-image uuid=617f4df1-b92e-4df0-8cde-751e3add55e5 size=8kB url=https://ostrov-tucnaku.cz/assets/files/2019-04-15/1555328116-191577-image.png]Bílé tričko[/upl-image]
clarkwinkelmann commented 4 years ago

I made some tests while updating the extension today, but I'm not sure what the best approach is. If we replace SIMPLETEXT with TEXT, then formatting is enabled inside the text, which isn't good, as it creates paragraphs and could contain other things.

TextFormatter doesn't provide "any text but not formatted" data type out of the box. I'm not sure how safe it would be to use a regex type here.

Basically the limitation is that the name is limited to /^[- +,.0-9A-Za-z_]+$/ as per https://s9etextformatter.readthedocs.io/Filters/Built-in_filters/

Unless the user edits the text, I don't think it breaks under normal usage ? We slugify the file name by default.

jtojnar commented 4 years ago

Well just using the file name as a caption is not very descriptive. I would consider it a bug if the library does not offer Unicode plaintext placeholder. How does the regex placeholder work? If it is ungreedy .* should work. Or .*? if the whole template consists of a single regex:

https://regexr.com/4tpps

clarkwinkelmann commented 4 years ago

Problem is, I don't know how easily we can use a regex-based attribute in a bbcode template in TextFormatter. Maybe it's easy, maybe it's not. I have not been able to test yet.

My hope is that using some sort of {REGEX:.*} rule in the template, we can allow any content in the bbcode, and prevent the parser from enabling parsing inside.

clarkwinkelmann commented 3 years ago

I'm not sure what happened with commit 026c09e9a6781e0455b78f197eb5a2fb0928ed37 I don't remember if it was part of a PR that was rejected, or if it got lost during a rebase. Maybe we should check again whether allowing formatting in the caption is still deal breaking or not.