globalizejs / globalize

A JavaScript library for internationalization and localization that leverages the official Unicode CLDR JSON data
https://globalizejs.com
MIT License
4.8k stars 605 forks source link

Strictness of grouping separators when using Globalize.parseNumber #816

Closed TheDutchDevil closed 6 years ago

TheDutchDevil commented 6 years ago

While using Globalize to parse inputted currency we ran into unexpected issues related to the strictness of the grouping separator. For instance when using globalize initialized for the culture en-GB:

globalizationContext.parseNumber("100,0,00.00") //returns NaN
globalizationContext.parseNumber("100,000.00") // returns 100000

Now from what I found online this has been discussed before for this project. However, I would expect the behavior of parseNumber to be more in line with existing software, such as C#'s Convert.ToDecimal("100,0,00.00") or Java's NumberFormat.

We ran into this issue because we started noticing that values that are server can parse, and which are therefore valid numbers to our application, cannot be parsed or formatted by Globalize. This mismatch is causing us a considerable headache. As we send a string to our back-end, which is successfully parsed and saved in the database. After the successful API call we then try to parse and format the number with Globalize. However, this fails as Globalize can't parse a number with mismatched grouping separators.

Are you aware of a canonical way of solving this issue? Or is there a way to have Globalize parse a number with unbalanced grouping separators?

rxaviers commented 6 years ago

Yeap, Globalize is strict about grouping separator, which is useful to avoid undesired cases like incorrectly parsing wrong decimal separator:

en.parseNumber("100,00")
> NaN // Not 10000
en.parseNumber("100.00")
> 100

Having said that, there's no easy way to make it lenient in that regard. A user-land hack would be:

// Note: .cldr.main doesn't work on precompiled globalize code
en.parseNumber(
  "100,0,00.00".replace(
    new RegExp(en.cldr.main("numbers/symbols-numberSystem-latn/group"), "g"),
    ""
  )
);
> 100000
rxaviers commented 6 years ago

I'm closing the issue, but I'd like to hear if you have any ideas.

TheDutchDevil commented 6 years ago

How would you envision the user-land hack working for a culture like German where apparently spaces and periods are an acceptable grouping separator? As our customer requires us to support UK English, Dutch, French and German.

As for suggestions, I would expect that an unbalanced grouping separator would be parseable as the separator only has cosmetic purposes. Instead of conveying any semantic information.

A thought that sprang to mind is adding an option to the number parser which specifies whether strict grouping separator usage should be enforced, or whether the parser should allow an unbalanced grouping separator. However, I don't know if that would be considered over-engineering.

rxaviers commented 6 years ago

Globalize only supports what's in CLDR, which uses . as grouping separator for German https://github.com/unicode-cldr/cldr-numbers-full/blob/master/main/de/numbers.json#L19 (i.e., it won't accept space).

Feel free to go on with your suggestion and elaborate how the API should look like and then, after this is discussed, submit a PR please.

In general, globalize parsers should provide an inverse operation for globalize formatters, so that it enables UI input fields such as this proof of concept for date input. It's out of scope for globalize parsers to interpret any kind of free text.

TheDutchDevil commented 6 years ago

Yeah I already checked the CLDR data and found out that it indeed only uses the .. But we'll cross that bridge when we come to it, I'd have to say that I am not entirely sure how common spaces in German numbers are.

I can further work on specifying the API, but I'll first have to discuss with our team whether we want to pursue this further as a solution to the problem. Because again, I just don't have any time for this.

I have to admit that I wouldn't immediately know how to design a meaningful inverse of a loose group separator setting for the formatter. As I wouldn't know how you could format a number with imprecise grouping.

rxaviers commented 6 years ago

I can further work on specifying the API, but I'll first have to discuss with our team whether we want to pursue this further as a solution to the problem. Because again, I just don't have any time for this.

Ok.

I have to admit that I wouldn't immediately know how to design a meaningful inverse of a loose group separator setting for the formatter. As I wouldn't know how you could format a number with imprecise grouping.

No, I don't think formatter needs any change. What I mean is that parser's goal is to transform a localized string (generated by the formatter) back to the canonical type (e.g., a number, or a date).

TheDutchDevil commented 6 years ago

I'm afraid I'm not going to have the time to work on this. We're planning on using the user-land hack to atleast have the front-end parse and format the data inputted. How we're going to handle this for German is something we're not yet sure about. But then that's further down the line.

No, I don't think formatter needs any change. What I mean is that parser's goal is to transform a localized string (generated by the formatter) back to the canonical type (e.g., a number, or a date).

I understand what you mean there, but for us we're really using to to parse user input. And we want to provide the user with a large amount of flexibility when it comes to the grouping separator. Especially as our flow of sending user input to the server is quite convoluted.

rxaviers commented 6 years ago

Ok