danielrichman / strict-rfc3339

Strict, simple, lightweight RFC3339 functions
GNU General Public License v3.0
31 stars 13 forks source link

Only ASCII digits should be accepted #8

Open lumikanta opened 5 years ago

lumikanta commented 5 years ago

With Python 3 regex allows use of some unicode digits - rfc3339_regex = re.compile( r"^(\d\d\d\d)-(\d\d)-(\d\d)T" r"(\d\d):(\d\d):(\d\d)(.\d+)?(Z|([+-])(\d\d):(\d\d))$")

Forcing to ASCII only would make results safer: rfc3339_regex = re.compile( r"^([0-9]{4})-([0-9]{2})-([0-9]{2})T" r"([0-9]{2}):([0-9]{2}):([0-9]{2})(.[0-9]+)?(Z|([+-])([0-9]{2}):([0-9]{2}))$")

For example with Python 3 all these are treated as valid: ['2018-12-24T11:32:00Z', '٢٠١٨-٠١-٠١T١٢:٠٣:١٦.٨٧٩+08:00', '௨௦௫௮-௧௧-௦௯T04:04:04Z', '๒๐๑๘-๐๒-๑๒T๒๒:๒๒:๒๒.๒๒๒Z', '201๒-0๒-0๒T๒๒:๒๒:๒๒.๒๒๒Z']

With Python 2.7 only first one is accepted.

danielrichman commented 5 years ago

ooh, good point, thanks. I'll try and dust off the python dev environment and fix this (and the other things in the backlog) soon.