ONSdigital / csvw-check

A CLI to validate CSV-Ws (W3C's CSV on the Web standard).
Apache License 2.0
1 stars 1 forks source link

Number pattern validation - change current implementation with icu.ibm library #74

Closed josepajay closed 3 years ago

josepajay commented 3 years ago

The icu ibm library currently used in the project does not comply with the uts35/ldml standard completely. For example a number pattern #0.0#,# is valid as per uts35 but the library raises an exception for this.

There are test cases by W3C containing such patterns. eg: Validation test case - 282, 285

Test cases: 155, 158 are also ignored as of now.

Update:

Issue/bug raised by us https://unicode-org.atlassian.net/browse/ICU-21689 The issue raised by us is already known to them (ICU IBM) and the related issue are https://unicode-org.atlassian.net/browse/CLDR-7166 https://unicode-org.atlassian.net/browse/ICU-10794

In ICU IBM library, they have used 2 grouping separators - primary grouping separator and secondary grouping separator. The primary grouping size used for the least significant integer group, and the secondary grouping size used for more significant groups.

When it comes to the fractional part grouping, it is ambiguous whether there should be 2 grouping separators (primary and secondary). This is the part in uts-35 /ldml which talks about the grouping separators http://www.unicode.org/reports/tr35/tr35-31/tr35-numbers.html#Number_Symbols

We have decided to leave this issue as of now. Reasons for this are,

Maybe once everything else is done with this project (CSV-Validation), we can have a go at this issue.

The IDE configuration to use when working on the ICU.IBM project (java part)

https://app.zenhub.com/files/304276433/5e35035f-440d-400b-9a0e-a0f57993f2cc/download