eemeli / message-resource-wg

Developing a standard for Unicode MessageFormat 2 resources
4 stars 1 forks source link

Determining the end of a message #6

Closed eemeli closed 1 year ago

eemeli commented 2 years ago

A message resource file will contain multiple messages, so we need to be able to detect when parsing when one message ends and the identifier for the next one starts. Within message contents, it's possible to look for the terminal }, but newlines may show up between any tokens.

With a valid message, the first token after whitespace including a newline may be any of the following:

Possible solutions:

  1. Require message values to be more indented than their identifiers.
  2. Indicate the end of a message or the start of a message identifiers using some syntax that does not match any message tokens, such as "key" or , key.
  3. Do not allow for newlines before of between when keys, removing the "any NmToken" class of possible line starts, and then require quoting any keys that overlap with remaining message syntax, e.g. "let" or "when-message".
stasm commented 2 years ago

Perhaps another option to consider?

  1. Delimit the entire body of the message, e.g. with {}key {let $foo = {$bar} {value}}.
eemeli commented 2 years ago

I would prefer not quoting the message value, as we're getting pretty quote-y already. Consider these simple messages:

first {{Hello world}}
second {{Hello {$place}}}

To me, that requires some serious mental lifting to focus on the actual message contents.

In the meantime, I've thought of a another possible solution:

  1. Always rely on the terminal } to end a message. For messages with multiple variants, detect the last case from its when * * signature.

That would pretty much require the parser to be single-pass. I'm pretty sure that error recovery would not be affected, and even something like a missing end brace would only eat the following message at most.

It would also not allow for potentially later supporting exhaustive cases (say, true and false for a boolean matcher) without a final when * {} or something similar.

stasm commented 2 years ago

I think the direction to try and re-use the terminal } from the massage's body is a good one. Perhaps detecting the last variant wouldn't be necessary if when is considered a reserved keyword in the resource syntax? #5 can help in cases when the message's key would need to be literally when.

eemeli commented 1 year ago

By requiring message values to always be indented, PR #11 allows us to easily distinguish the end of a message by finding unindented content.

eemeli commented 1 year ago

Just to note, this was indeed resolved by #11.