Closed josepajay closed 2 years ago
I've done the work to implment an initial approach to validation of number formats (and numbers represented with them) using scala's parser combinator functionality. I've added a number of unit tests to describe/ensure functionality.
Work yet to do:
We've come to the conclusion after looking at problems related to issues #76, #74 which show that we're not implementing number format validation in the way that the W3C CSV-W spec suggests we should be. Admittedly, it isn't very clear on how the validation should work, but the test cases suggest that we're not quite doing it right.
For instance, a format using an optional digit char (#) in the fraction part, i.e.
000.00E#0
suggests that the number123.45E67
should be valid, but12345E678
should not be valid because there are too many digits there. Unfortunately the IBM-ICU tool doesn't recognise said format even though it's in one of the CSV-W test cases.Essentially the IBM-ICU library is focused on using the UTS-35 spec to format numbers into strings, whereas the W3C CSV-W spec requires us to use the same characters but to use them to parse numbers. So optional digits add some complexity that we can't work around without writing a parser.
So, we need to prototype and implement a parser which ensures that we pass the W3C CSV-W tests with numbers:
See the W3C validation test cases here