awalker89 / openxlsx

R package for .xlsx file reading and writing.
Other
365 stars 78 forks source link

Write mixed EDT/EST DateTimes #438

Closed blakeboswell closed 11 months ago

blakeboswell commented 5 years ago

Issue

When writing to a worksheet, the timezone of the first non-na value in a datetime column is used to calculate Excel compatible datetimes for all values in the column.

If the first non-na value's timezone is EST/EDT, then other values in the column that are EDT/EST will be offset improperly by -/+ 1 hour.

See the below example.


library(openxlsx)
library(lubridate)

packageVersion("openxlsx")

x <- data.frame(
  a = c(ymd_hms("2018-03-12 11:00:00", tz = "America/New_York"),
        ymd_hms("2018-03-10 11:00:00", tz = "America/New_York")),
  b = c(ymd_hms("2018-03-10 11:00:00", tz = "America/New_York"),
        ymd_hms("2018-03-12 11:00:00", tz = "America/New_York"))
)

write.xlsx(x, file = "test.xlsx")
y   <- read.xlsx("test.xlsx")
y$a <- convertToDateTime(y$a, tz = "America/New_York")
y$b <- convertToDateTime(y$b, tz = "America/New_York")
y
## [1] '4.1.0'

##                     a                   b
## 1 2018-03-12 11:00:00 EDT 2018-03-10 11:00:00 EST
## 2 2018-03-10 12:00:00 EST 2018-03-12 10:00:00 EDT

See also issue #424.

Fix

This PR adapts the existing code so that all values' timezones in a datetime column are considered rather than only the first non-na value's timezone.

Benchmark

Using all values for the Excel datetime calculation is more computationally expensive. This simple benchmark indicates that the cost is < 1 second increase per 100K datetime values written.

library(openxlsx)
library(lubridate)
library(rbenchmark)

n <- 10^5
t <- 1:n*60 + ymd_hms("1970-03-10 11:00:00", tz = "America/New_York")
x <- data.frame(t1 = t, t2 = t, t3 = t, t4 = t, t5 = t)

benchmark(
  "version" = {write.xlsx(x, file = "test.xlsx")},
  replications = 5,
  columns = c("test", "replications", "elapsed",
              "relative", "user.self", "sys.self")
)
test  replications elapsed relative user.self sys.self
4.0.1            5  12.748        1    11.921    0.742
PR               5  17.767        1    16.697    0.967