...is too liberal, because alternation (|) has higher precedence than most RE syntax other than parentheses. As a result, the pattern will match either anything that starts with 19, or a date of the form 20[0-9][0-9]-[01][0-9]-[0123]?[1-9].
To alternate only 19 and 20, but not the rest of the pattern, they need to be enclosed in parentheses: (19|20)
...The tail end of the pattern is questionable, as well. Since the last character accepts only [1-9], but the character preceding it is optional, it will truncate the match for any day ending in 0, only matching on the first 10 characters (out of 11). This means that, for example, 2004-01-30 matches, but only as far as 2004-01-3. Therefore, broken strings like 2004-01-3a will also match.
with a non-optional leading day digit, since ISO-8601 doesn't recognize strings of the form 2004-1-1 or 2004-01-1 as valid dates; all 8 digits are required.
I don't know if it's actually used anywhere, but the pattern used for
is_iso_date
:https://github.com/nexB/saneyaml/blob/40e5fa7c0b6e0012452053839184e5cd29802063/src/saneyaml.py#L330
...is too liberal, because alternation (
|
) has higher precedence than most RE syntax other than parentheses. As a result, the pattern will match either anything that starts with19
, or a date of the form20[0-9][0-9]-[01][0-9]-[0123]?[1-9]
.To alternate only
19
and20
, but not the rest of the pattern, they need to be enclosed in parentheses:(19|20)
...The tail end of the pattern is questionable, as well. Since the last character accepts only
[1-9]
, but the character preceding it is optional, it will truncate the match for any day ending in0
, only matching on the first 10 characters (out of 11). This means that, for example,2004-01-30
matches, but only as far as2004-01-3
. Therefore, broken strings like2004-01-3a
will also match.The pattern should be:
with a non-optional leading day digit, since ISO-8601 doesn't recognize strings of the form
2004-1-1
or2004-01-1
as valid dates; all 8 digits are required.