digital-preservation / csv-schema

CSV Schema
http://digital-preservation.github.io/csv-schema
Mozilla Public License 2.0
98 stars 33 forks source link

Validation expression for floating point numbers? #6

Open msrocka opened 8 years ago

msrocka commented 8 years ago

Is there a way to describe that a column contains plain floating point numbers (except by using a regex pattern)? There is a positiveInteger validation expression so I would expect that there is also something like decimal or double but could not find it.

Thank you

DavidUnderdown commented 8 years ago

There isn't at present (nor had it been slated for inclusion in 1.1 which is just being worked on). However, I did just realise today that we may have a forthcoming project for which it would be useful.

DavidUnderdown commented 8 years ago

I forgot yesterday that the Range Expression takes a Numeric Literal, which allows decimals. No expression allows exponent (aka scientific) notification however. In CSV Schema 1.0 range must have both and upper and lower bound, in the forthcoming CSV Schema 1.1 you will be able to specify only a lower, or only an upper bound. Similar to the Length Expression, the end of the range that is not to be bounded will be represented by an asterisk, ie range(20.5,) would indicate that the value in that column must be at least 20.5, while range(,20.5) would indicate that the column value must be at most 20.5.

msrocka commented 8 years ago

@DavidUnderdown Thank you for your answers. I like the extension of the range expression.

DavidUnderdown commented 8 years ago

Looking over the full EBNF, we have a Positive Integer Expression, and a PositiveIntegerLiteral: but while we have a Numeric literal there is no Numeric Expression to allow a simple check that a column holds a real number expressed as a decimal. Range will take such values, so the Numeric Expression would be useful to allow a check that the data is of the correct type before trying the range (or we may want to have the range when it is numeric, and some other check, such as is("N/A") when the column has a non-numeric value). I'll see if we can get this in to 1.1

DavidUnderdown commented 8 years ago

Sorry, we didn't get this into 1.1, but hopefully using range covers most of the necessary functionality. I'll leave this open for now.

marceloverdijk commented 3 years ago

I'm looking for this as well. Best way currently to use a regexp? (e.g. regex("[0-9]*\.?[0-9]+"))

DavidUnderdown commented 3 years ago

That, or use of range as described above

marceloverdijk commented 3 years ago

Thx @DavidUnderdown for confirming. I must say I like the spec for validating some csv exports I have. I'm just wondering if the spec is still being actively worked on (1.2)? My schema's are now full of regex's, which could be easily implemented with - better readable - Numeric Expression if they would be available. Similar as positiveInteger it would be great to have integer, negativeInteger, decimal, positiveDecimal, negativeDecimal expressions being added to the spec.