earlephilhower / newlib-xtensa

newlib-xtensa fork intended for esp8266
GNU General Public License v2.0
5 stars 7 forks source link

TZ: Issue parsing glibc timezones #12

Closed d-a-v closed 3 years ago

d-a-v commented 4 years ago

ref: https://github.com/esp8266/Arduino/pull/7699

Posix timezone strings defined in current common linux distributions running glibc are sometimes incorrectly parsed by newlib.

Their format starts with ABRVnn[ABRV[nn]][,...]. For example: GMT0BST,... is London TZ descriptor with two abbreviations GMT and BST. ABRV is an abbreviation meaning something for humans. BST means "British Summer Time".

Such abbreviations are not defined for every timezone around the world. It is said in https://data.iana.org/time-zones/theory.html (source) that :

If there is no common English abbreviation, use numeric offsets like -05 and +0530 that are generated by zic's %z notation.

These numeric offsets are enclosed between <...>. For example, abbreviation for Sao Paulo TZ is <-03>3 (instead of for example valid SAOPAULO3).

The full path from official definitions starts from the above repository: zic the zoneinfo compiler uses files defining timezones on all continents to build most linux distribution's /usr/share/zoneinfo/* files, which are parsed by this tool to produce a csv file used by esp8266/arduino. One will notice that quite a large number of abbreviations are numeric.

The issue is that numeric abbreviations like <-03>3 are incorreclty parsed by newlib.

On the other hand, it seems that glibc's TZ parser is able to do so despite the fact that numeric abbreviations do not seem to follow posix TZ definition.

Abbreviations values are anyway unused in esp8266/arduino time library. To circumvent the parsing issue, numeric abbreviations are (about to be) converted to a posix compliant random string thanks to a script (in the PR referred on top of this message).