eemeli / message-resource-wg

Developing a standard for Unicode MessageFormat 2 resources
4 stars 1 forks source link

Escaping hard-to-print characters #1

Closed eemeli closed 1 year ago

eemeli commented 2 years ago

Some characters are not easily visually representable or input by a user. To represent them in a message resource, it should be possible to escape them using their Unicode code points.

Possible solutions:

  1. Use common Unicode \ escape codes with hexadecimal values, such as: \xab, \uabcd, and \Uabcdef.
  2. Also allow escaping common whitespace characters, such as \n, \r, and \t.
  3. Do not allow for character escapes.
stasm commented 2 years ago

Back in the Fluent times, our hope was that most Unicode characters would be written verbatim, as the Unicode glyphs themselves. If the translation wants to use 😀, then it should just use that particular glyph, rather than escape it as \U01f600.

The main motivation behind adding Unicode escape sequences was then to make it possible to make non-printable or invisible characters stand out in the translation's source. The most notable example was the non-breaking space.

I'm providing this little bit of historical context in order to advocate against option (3).

eemeli commented 1 year ago

The solution in #11 is a combination of 1. and 2., and adds to those escapes for spaces, tabs, and relevant syntax characters. It also specifically provides for escape sequences defined in MF2 to not need double-escaping.