MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17724] as.POSIXct returns incorrect results if env "TZ" is touched on macOS #6898

Open MichaelChirico opened 4 years ago

MichaelChirico commented 4 years ago

Hi, I'm using R3.6.2 on macOS. The issue is if we set the environment variable TZ to some value and called as.POSIXct. After we set the "TZ" ENV back to the original value, the value returned from as.POSIXct is different from the correct one. We have to call as.POSIXct with tz = "anything other than the default" for once, in order to have the correct value back.

### The reproducible code

as.POSIXct("2020-01-01 07:00:40")
#> [1] "2020-01-01 07:00:40 CST"
oldtz=Sys.getenv('TZ')
Sys.setenv(TZ='Asia/Jakarta') # UTC+7
as.POSIXct("2020-01-01 07:00:40")
#> [1] "2020-01-01 07:00:40 WIB"
Sys.setenv(TZ=oldtz)

# after we set back the timezone, the as.POSIXct starts returning unexpected results
as.POSIXct("2020-01-01 07:00:40")
#> [1] "2020-01-01 15:00:40 CST"
as.POSIXct("2020-01-01 07:00:40")
#> [1] "2020-01-01 15:00:40 CST"

# if we set the tz explicility for once, the results go back to normal
as.POSIXct("2020-01-01 07:00:40", tz = Sys.timezone())
#> [1] "2020-01-01 07:00:40 CST"
as.POSIXct("2020-01-01 07:00:40")
#> [1] "2020-01-01 07:00:40 CST"

### the session info

R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] usethis_1.5.1

loaded via a namespace (and not attached): [1] compiler_3.6.2 htmltools_0.4.0.9002 tools_3.6.2 fs_1.3.1
[5] glue_1.3.1 yaml_2.2.1 Rcpp_1.0.3 rmarkdown_2.1
[9] knitr_1.27.2 xfun_0.12 digest_0.6.23 rlang_0.4.4
[13] evaluate_0.14


METADATA

MichaelChirico commented 4 years ago

I'm not experiencing the same behavior as you are under Catalina.

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 EST"

oldtz=Sys.getenv('TZ') Sys.setenv(TZ='Asia/Jakarta') # UTC+7 as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 WIB"

Sys.setenv(TZ=oldtz)

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 EST"

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 EST"
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] graphics  grDevices datasets  stats     utils     methods   base     

other attached packages:
[1] bit64_0.9-7    bit_1.1-15.2   devtools_2.2.2 usethis_1.5.1 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3         magrittr_1.5       pkgload_1.0.2      R6_2.4.1           rlang_0.4.4        fansi_0.4.1        tools_3.6.2       
 [8] pkgbuild_1.0.6     sessioninfo_1.1.1  cli_2.0.1          withr_2.1.2        ellipsis_0.3.0     remotes_2.1.1.9000 assertthat_0.2.1  
[15] digest_0.6.25      rprojroot_1.3-2    crayon_1.3.4       processx_3.4.2     callr_3.4.2        fs_1.3.1           ps_1.3.2          
[22] testthat_2.3.1     memoise_1.1.0      glue_1.3.1         compiler_3.6.2     desc_1.2.0         backports_1.1.5    prettyunits_1.1.1 

METADATA

MichaelChirico commented 4 years ago

Bob, I reinstall R and the issue still occurs.

I get "" (empty string) from Sys.getenv('TZ'). Do you get the same value?

The reason I'm asking this is that if I set "TZ" to anything else than "", the issue won't happen. It's reproducible only when the "TZ" environment variable is unset.


METADATA

MichaelChirico commented 4 years ago

Well, it's verified by Michael Chirico. See https://github.com/Rdatatable/data.table/pull/4261#issuecomment-593084116


METADATA

MichaelChirico commented 4 years ago

You are not actually setting back the TZ environment variable. It is initially unset, and you are setting it to an empty string, which is not the same.

To actually restore it you would need something like

oldtz <- Sys.getenv("TZ", unset = NA) ... if (is.NA(oldtz) Sys.unsetenv("TZ") else Sys.setenv(TZ = oldtz)

In a fresh session you can see that setting TZ to an empty string produces the result you find surprising. I don't know if that is consistent with documented behavior or not.

Sys.getenv("TZ", unset = NA)

[1] NA

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 CST"

Sys.setenv(TZ = "")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 01:00:40 CST"


METADATA

MichaelChirico commented 4 years ago

Luke, thanks for pointing this out. Just realize that I've misunderstood this for so long.

Anyway, as of the issue itself, I fail to find out the proper documentation on either ?as.POSIXct, ?strptime or ?timezone.

In addition, this behavior looks odd, in my opinion, as an empty "TZ" env variable should be regarded as "unset" or "meaningless", instead of returning surprising results.

So I will leave this issue open for now.


METADATA

MichaelChirico commented 4 years ago

?Sys.timezone clearly says

 If ‘TZ’ is set but invalid, most platforms default to ‘UTC’, the
 time zone colloquially known as ‘GMT’ (see <URL:
 https://en.wikipedia.org/wiki/Coordinated_Universal_Time>).  (Some
 but not all platforms will give a warning for invalid values.)  If
 it is unset or empty the _system_ time zone is used (the one
 returned by ‘Sys.timezone’).

METADATA

MichaelChirico commented 4 years ago

Hi, Benjamin, thanks for investigating this. However, I don't think it's documented.

First of all, Sys.timezone() is not necessarily the same as the so-called "current timezone" that's used by as.POSIXct(). According to ?Sys.timezone, it "will return the value of TZ if set initially". The resetting on env var "TZ" will affect as.POSIXct() but not Sys.timezone().

Secondly, let's assume the "current timezone" used by as.POSIXct() follows the same rule "If ‘TZ’ is set but invalid, most platforms default to ‘UTC’, the time zone colloquially known as ‘GMT’". So it should be a fixed one like "UTC" or whatever. However, the below codes show that it's not a fixed one. If I call as.POSIXct with tz argument once, the result changes correspondingly.

Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 15:00:40 CST"

as.POSIXct("2020-01-01 07:00:40", tz = 'Asia/Shanghai')

[1] "2020-01-01 07:00:40 CST"

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 CST"


METADATA

MichaelChirico commented 4 years ago

The key in understanding your last example is that as.POSIX*() function modify TZ as part of the implementation when they are required to convert time zones in order to get the system calls to use the appropriate time zone:

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 NZDT"

Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 20:00:40 NZDT"

Sys.getenv("TZ", unset=NA)

[1] ""

as.POSIXct("2020-01-01 07:00:40", "Asia/Shanghai")

[1] "2020-01-01 07:00:40 CST"

Sys.getenv("TZ", unset=NA)

[1] NA

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 NZDT"

So when changing the time zone to "Asia/Shanghai", R has to modify TZ to switch to that zone in order to interpret the time in that zone. It then resets TZ to a value used before, but in this special case unsets it because "" as TZ is not valid time zone.

So the bottom line is that it is a user error to set TZ to "" as that is not a valid time zone. Note that this is true on Linux as well with the exception that there at least the time zone is visible in the pathological case:

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 CST"

Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 UTC"

as.POSIXct("2020-01-01 07:00:40","US/Eastern")

[1] "2020-01-01 07:00:40 EST"

Sys.getenv("TZ", unset=NA)

[1] NA

as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 CST"


METADATA

MichaelChirico commented 4 years ago

Simon, thanks for explaining this. Yes, now I understand the root cause is the invalid TZ that the user provides.

However, there's one minor issue that R doesn't throw warnings for this special invalid "TZ" value. In contrast, R will warn the user (at least on macOS) for other invalid "TZ" values, as long as it's not empty.

Despite the low chance that people will actually set "TZ" to an empty string, when they do, the situation is often a surprise and difficult to figure out.

Sys.setenv(TZ="")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 15:00:40 CST"

Sys.setenv(TZ="ABCD")
as.POSIXct("2020-01-01 07:00:40")

[1] "2020-01-01 07:00:40 GMT" Warning message: In strptime(xx, f, tz = tz) : unknown timezone 'ABCD'

Anyway, I admit it's a small issue so I'll let you decide to close this issue or not.

Thanks again for all of your answers.


METADATA

MichaelChirico commented 4 years ago

It turns out, that the way the time zones are loaded is to use relative paths from $R_HOME/shared/zoneinfo and the check is simply to see if there is read access. So if you use an empty string, it resolves to that directory itself which is readable, so there is no warning, but when R tries to read the time zone info, it is bogus as it is not a file but rather a directory.

So I think we need to change extra/tzone/localtime.c to be more careful when reading the time zone information and also check whether the path is a regular file.

Also, interestingly, it seems that the code was written to specifically allow .. in the path, so you can construct paths to other parts of the system, e.g. TZ="../../bin/R" will read in the R start script as the time zone info. It seems deliberate, since TZ also explicitly allows absolute paths such as "/etc", but I would argue that it may be prudent to only allow relative paths for security reasons. This is far away from the original report, but I think tightening the look up may be beneficial in more than one way.


METADATA

MichaelChirico commented 4 years ago

BTW: I'm wondering if we should split this into two issues:

1) TZ="" should be treated as unset TZ (since it is treated as invalid currently but does revert to unset after TZ manipulations)

2) TZ set to values that don't yield a valid time zone should result in a warning in all cases (currently it only issues a warning when the file doesn't exist)


METADATA