digital-preservation / csv-schema

CSV Schema
http://digital-preservation.github.io/csv-schema
Mozilla Public License 2.0
98 stars 33 forks source link

Regex for timezone erroneous #43

Open mightyCelu opened 1 year ago

mightyCelu commented 1 year ago

Both the regex for XsdTimezoneComponent as well as for the optional variant are erroneous:

((\+\|-)(0[1-9]\|1[0-9]\|2[0-4]):(0[0-9]\|[1-5][0-9])\|Z)

This does not allow for timezones with two leading zeros, e.g. +00:30.

On a similar note, the minute-part of the regex could be simplified from 0[0-9]\|[1-5][0-9] to [0-5][0-9].

DavidUnderdown commented 1 year ago

While half hour timezones do exist, so far as I'm aware neither +00:30 or -00:30 is used (or anything else between +01:00 and -01:00 exclusive) so I'm not sure it is actually erroneous to define in that way. There's possibly a good reason for why the second is written that way too, @adamretter do you have any idea?

mightyCelu commented 1 year ago

That is true. However, I still think the inconsistency is worth addressing. Which timezones currently exist can change, and it is also conceivable to have data at a time offset that does not correspond to a commonly accepted timezone. This is something the currently specified regex reflects (e.g. +01:42), just not for offsets beginning with 00. Moreover, the offsets +00:00 & -00:00, are also not valid with the current regex. While they are equivalent to Z, I feel it would be arbitrary to forbid these values. In particular, since the referenced data type specification (3.7.2.3) permits these values as well.

All in all I would suggest the following regex:

((\+\|-)((0[0-9]\|1[0-3]):([0-5][0-9])\|14:00)\|Z)

which includes the following changes: