antlr / stringtemplate4

StringTemplate 4
http://www.stringtemplate.org
Other
956 stars 231 forks source link

Correctly handle surrogate pairs in StringRenderer.encodeHTML #261

Closed Clashsoft closed 4 years ago

Clashsoft commented 4 years ago

Changed the StringRenderer.encodeHTML method, i.e. the implementation for format="xml-encode", in order to support Unicode characters encoded as two chars (surrogate pairs). An example where this problem occurred was with emojis, as outlined in #260. While the old implementation produced two invalid HTML entities �� for the two characters encoding the emoji "🩳", after this change it only generates one entity, namely 🩳 (ref.: https://unicode-table.com/de/1FA73/)

Closes #260

parrt commented 4 years ago

Thanks, @Clashsoft !!