Open MichaelChirico opened 4 years ago
I'm not experiencing the same behavior as you are under Catalina.
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 EST"
oldtz=Sys.getenv('TZ') Sys.setenv(TZ='Asia/Jakarta') # UTC+7 as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 WIB"
Sys.setenv(TZ=oldtz)
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 EST"
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 EST"
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] graphics grDevices datasets stats utils methods base
other attached packages:
[1] bit64_0.9-7 bit_1.1-15.2 devtools_2.2.2 usethis_1.5.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 magrittr_1.5 pkgload_1.0.2 R6_2.4.1 rlang_0.4.4 fansi_0.4.1 tools_3.6.2
[8] pkgbuild_1.0.6 sessioninfo_1.1.1 cli_2.0.1 withr_2.1.2 ellipsis_0.3.0 remotes_2.1.1.9000 assertthat_0.2.1
[15] digest_0.6.25 rprojroot_1.3-2 crayon_1.3.4 processx_3.4.2 callr_3.4.2 fs_1.3.1 ps_1.3.2
[22] testthat_2.3.1 memoise_1.1.0 glue_1.3.1 compiler_3.6.2 desc_1.2.0 backports_1.1.5 prettyunits_1.1.1
Bob, I reinstall R and the issue still occurs.
I get "" (empty string) from Sys.getenv('TZ')
. Do you get the same value?
The reason I'm asking this is that if I set "TZ" to anything else than "", the issue won't happen. It's reproducible only when the "TZ" environment variable is unset.
Well, it's verified by Michael Chirico. See https://github.com/Rdatatable/data.table/pull/4261#issuecomment-593084116
You are not actually setting back the TZ environment variable. It is initially unset, and you are setting it to an empty string, which is not the same.
To actually restore it you would need something like
oldtz <- Sys.getenv("TZ", unset = NA) ... if (is.NA(oldtz) Sys.unsetenv("TZ") else Sys.setenv(TZ = oldtz)
In a fresh session you can see that setting TZ to an empty string produces the result you find surprising. I don't know if that is consistent with documented behavior or not.
Sys.getenv("TZ", unset = NA)
[1] NA
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 CST"
Sys.setenv(TZ = "")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 01:00:40 CST"
Luke, thanks for pointing this out. Just realize that I've misunderstood this for so long.
Anyway, as of the issue itself, I fail to find out the proper documentation on either ?as.POSIXct
, ?strptime
or ?timezone
.
In addition, this behavior looks odd, in my opinion, as an empty "TZ" env variable should be regarded as "unset" or "meaningless", instead of returning surprising results.
So I will leave this issue open for now.
?Sys.timezone clearly says
If ‘TZ’ is set but invalid, most platforms default to ‘UTC’, the
time zone colloquially known as ‘GMT’ (see <URL:
https://en.wikipedia.org/wiki/Coordinated_Universal_Time>). (Some
but not all platforms will give a warning for invalid values.) If
it is unset or empty the _system_ time zone is used (the one
returned by ‘Sys.timezone’).
Hi, Benjamin, thanks for investigating this. However, I don't think it's documented.
First of all, Sys.timezone()
is not necessarily the same as the so-called "current timezone" that's used by as.POSIXct()
. According to ?Sys.timezone
, it "will return the value of TZ if set initially". The resetting on env var "TZ" will affect as.POSIXct()
but not Sys.timezone()
.
Secondly, let's assume the "current timezone" used by as.POSIXct()
follows the same rule "If ‘TZ’ is set but invalid, most platforms default to ‘UTC’, the time zone colloquially known as ‘GMT’". So it should be a fixed one like "UTC" or whatever. However, the below codes show that it's not a fixed one. If I call as.POSIXct
with tz
argument once, the result changes correspondingly.
Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 15:00:40 CST"
as.POSIXct("2020-01-01 07:00:40", tz = 'Asia/Shanghai')
[1] "2020-01-01 07:00:40 CST"
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 CST"
The key in understanding your last example is that as.POSIX*() function modify TZ
as part of the implementation when they are required to convert time zones in order to get the system calls to use the appropriate time zone:
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 NZDT"
Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 20:00:40 NZDT"
Sys.getenv("TZ", unset=NA)
[1] ""
as.POSIXct("2020-01-01 07:00:40", "Asia/Shanghai")
[1] "2020-01-01 07:00:40 CST"
Sys.getenv("TZ", unset=NA)
[1] NA
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 NZDT"
So when changing the time zone to "Asia/Shanghai", R has to modify TZ to switch to that zone in order to interpret the time in that zone. It then resets TZ to a value used before, but in this special case unsets it because "" as TZ is not valid time zone.
So the bottom line is that it is a user error to set TZ to "" as that is not a valid time zone. Note that this is true on Linux as well with the exception that there at least the time zone is visible in the pathological case:
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 CST"
Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 UTC"
as.POSIXct("2020-01-01 07:00:40","US/Eastern")
[1] "2020-01-01 07:00:40 EST"
Sys.getenv("TZ", unset=NA)
[1] NA
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 CST"
Simon, thanks for explaining this. Yes, now I understand the root cause is the invalid TZ that the user provides.
However, there's one minor issue that R doesn't throw warnings for this special invalid "TZ" value. In contrast, R will warn the user (at least on macOS) for other invalid "TZ" values, as long as it's not empty.
Despite the low chance that people will actually set "TZ" to an empty string, when they do, the situation is often a surprise and difficult to figure out.
Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 15:00:40 CST"
Sys.setenv(TZ="ABCD")
as.POSIXct("2020-01-01 07:00:40")
[1] "2020-01-01 07:00:40 GMT" Warning message: In strptime(xx, f, tz = tz) : unknown timezone 'ABCD'
Anyway, I admit it's a small issue so I'll let you decide to close this issue or not.
Thanks again for all of your answers.
It turns out, that the way the time zones are loaded is to use relative paths from $R_HOME/shared/zoneinfo and the check is simply to see if there is read access. So if you use an empty string, it resolves to that directory itself which is readable, so there is no warning, but when R tries to read the time zone info, it is bogus as it is not a file but rather a directory.
So I think we need to change extra/tzone/localtime.c to be more careful when reading the time zone information and also check whether the path is a regular file.
Also, interestingly, it seems that the code was written to specifically allow ..
in the path, so you can construct paths to other parts of the system, e.g. TZ="../../bin/R" will read in the R start script as the time zone info. It seems deliberate, since TZ also explicitly allows absolute paths such as "/etc", but I would argue that it may be prudent to only allow relative paths for security reasons. This is far away from the original report, but I think tightening the look up may be beneficial in more than one way.
BTW: I'm wondering if we should split this into two issues:
1) TZ="" should be treated as unset TZ (since it is treated as invalid currently but does revert to unset after TZ manipulations)
2) TZ set to values that don't yield a valid time zone should result in a warning in all cases (currently it only issues a warning when the file doesn't exist)
Hi, I'm using R3.6.2 on macOS. The issue is if we set the environment variable TZ to some value and called
as.POSIXct
. After we set the "TZ" ENV back to the original value, the value returned fromas.POSIXct
is different from the correct one. We have to call as.POSIXct with tz = "anything other than the default" for once, in order to have the correct value back.### The reproducible code
### the session info
R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.3
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] usethis_1.5.1
loaded via a namespace (and not attached): [1] compiler_3.6.2 htmltools_0.4.0.9002 tools_3.6.2 fs_1.3.1
[5] glue_1.3.1 yaml_2.2.1 Rcpp_1.0.3 rmarkdown_2.1
[9] knitr_1.27.2 xfun_0.12 digest_0.6.23 rlang_0.4.4
[13] evaluate_0.14
METADATA