eemeli / message-resource-wg

Developing a standard for Unicode MessageFormat 2 resources
4 stars 1 forks source link

Add initial resource.abnf #11

Closed eemeli closed 1 year ago

eemeli commented 1 year ago

Closes #8, and presents initial answers for other issues, to which I'll add relevant comments.

An initial ABNF (RFC 5234, RFC 7405) syntax for a resource format is added, to act as a starting point.

The format is closely related to TOML and other .ini-like formats, as discussed in #8. Compared to what's presented there, the only real divergence is that "quoted keys" are left out, and instead \ escape sequences are well defined for both keys and values. This allows for any non-empty string to be used as a either a key or a value, and ensures that a resource is made up only of printable characters.

A key feature of the syntax is that multiline values are the only indented content in the syntax, and in value will have all of their leading whitespace stripped. This allows for a rather clean separation between message and resource syntaxes, so that a single file could be parsed by either a single-pass or multi-pass parser. The cost of this choice is that messages that contain lines after the first with leading whitespace (i.e. would match the regexp /\n[ \t]/ need to add a \ escape before the first contentful whitespace on the line.

eemeli commented 1 year ago

Some examples:

# Example message resource
# This first two-line comment attaches to the whole resource

key = {value}
some.greeting = {Hello {$place}!}
kääk\ \&\ yök = {Message with key "kääk & yök"}

multiline = {This message
  consists of three lines,
  \  and this line starts with two spaces.}

[section]

# This is a standalone comment.

# This comment attaches to the with-vars message.
with-vars =
  let $foo = {$bar :number}
  {The count is {$foo}}

with-variants =
  let $foo = {$bar :number}
  match {$foo}
  when one {You have {$foo} thing.}
  when * {You have {$foo} things.}

# This comment attaches to the section.sub section
[section.sub]

key = {Inner message at section.sub.key}
key . sub.key = {Message at section.sub.key.sub.key}
eemeli commented 1 year ago

I added escapes for all symbols/punctuation for the id, and expanded its allowed character set. This should make it much rarer to need to resort to \x or \u escapes for the identifier.

flodolo commented 1 year ago

A couple of thoughts looking at the examples, as I'm not really familiar with ABNF and formal syntax definitions:

eemeli commented 1 year ago

Fluent has explicitly defined resource and group comments, which I find very useful. Is there a reason to not support those?

Comments may also attach to sections, which provides similar functionality as Fluent group comments; I've now added an example for this.

I agree that some syntax -- as opposed to just position -- may be needed for resource-level comments. Maybe something like Fluent's ###, or maybe a JSDoc-like @resource tag in a comment? In any case, we should figure out a solution while we also figure out the language for in-comment metadata.

Wouldn't it be better to actually prevent spaces from being used in IDs?

I'm not sure that they should be prevented at the language level, but rather in validation. Allowing for any non-empty string may for instance make it easier to automate a migration from gettext to MF2.

flodolo commented 1 year ago

IDs in spaces remind me of iOS (Xliff) and GetText using the string as an ID, which is just terrible. So, in a way, I'd be all for preventing spaces at language level.

Where can I learn more about sections? What's the benefit of using them? (namespacing?)

eemeli commented 1 year ago

Merging to give the initial spec a bit more visibility.