formatjs / formatjs-old

The monorepo home to all of the FormatJS related libraries.
https://formatjs.io/
156 stars 53 forks source link

intl-messageformat-parser: Some issues I ran into migrating to v4 #607

Closed Janpot closed 4 years ago

Janpot commented 4 years ago

Which package?

intl-messageformat-parser

Is your feature request related to a problem? Please describe.

  1. My library is using intl-messageformat-parser. To enable xml-like syntax in v3 I did some postprocessing on the AST. Which enabled me to allow self-closing tags. The new parser doesn't allow self-closing tags. Which means I can't move to the latest version as it fails to parse input that it used to parse before.

  2. My library also has a mode to disable xml syntax. This is now impossible with v4 as it always attempts to parse tags.

  3. It also enabled me to more clearly describe errors for invalid syntaxes. i.e. I could differentiate invalid input like follows and show clear error messages about why it's invalid

    <a.b></a.b> // Error: invalid tag name
    <a b="c"></a> // Error: no attributes allowed
    <></> // Error: invalid tag name

    Now all of these syntaxes throw the same general parser error which isn't very descriptive:

    Expected "#", "'", "\n", "{", argumentElement, double apostrophes, end of input, or tagElement but "<" found.

    While this is not really a breaking change for me, it would be nice to be able to show more descriptive errors for invalid syntax.

Describe the solution you'd like

  1. Allow self-closing tags in the parser. children can just be an empty array, or null in the AST node.

  2. Add a mode on the parser to turn off parsing tags and treat them as normal strings.

  3. More descriptive error messages on syntax errors.

Describe alternatives you've considered

Remove self-closing tags feature and string mode from my library (breaking change) or stay on v3 of the parser.

longlho commented 4 years ago
  1. Self-closing tags do get parsed in the parser, it's just a text node instead. In general self-closing tags are pointless because ICU already has {var} syntax.
  2. We don't plan to allow any settings to tune parsing. It's a PEG parser so doing that is non-trivial.
  3. Same reason as above, since it's a PEG parser, when it encounters a token that doesn't fit in the schema it triggers a syntax error. Keeping context would be non-trivial.
Janpot commented 4 years ago

I'm fine with not supporting self-closing tags, but you essentially just created a parser that fails on what used to be valid ICU messages. I think an option to enable/disable xml syntax is not such a crazy request.

We don't plan to allow any settings to tune parsing. It's a PEG parser so doing that is non-trivial.

Just skimmed through the docs. But it looks like options can be passed in the initializer.

The code inside the initializer can access options passed to the parser using the options variable.

Wouldn't it be just a matter of returning a literal node here instead of a tag node when an option xml: false is set? It's not trivial I guess, but doesn't seem crazy to me neither.

I'm looking into trying it out but running into weird line ending issues. Are you guys using windows exclusively on this project or something?

longlho commented 4 years ago

As I mentioned we don't plan to tweak this behavior. This powers our linter as well. Tag support is a popular ask and will be in the next standard of MessageFormat that we're working on.