JanMarvin / openxlsx2

openxlsx2 - read, write and modify xlsx files
https://janmarvin.github.io/openxlsx2/
Other
122 stars 13 forks source link

FR: Add argument `tz` to `read_xlsx` to be used when `detect_dates=TRUE` #1115

Closed iago-pssjd closed 3 months ago

iago-pssjd commented 3 months ago

Hi, currently, when times are detected, they appear as POSIXct with tz="". This seems due to the fact that read_xlsx uses convert_datetime without specifying tz. Then it would be great if we could pass this argument through read_xlsx.

I know I could use detect_dates=FALSE and later convert_datetime(..., tz = ...), but I only want to do this for va few variables with time, while I have many other variables with just dates which I prefer they being detected automatically.

Thanks!

JanMarvin commented 3 months ago

Hi @iago-pssjd , thanks for reaching out. I might add an option variable for this, but I’m a little confused why it shouldn’t work for you. For time you can also use the hms package, have a look at the book, iirc there’s a section.

JanMarvin commented 3 months ago

Hi, I've added a PR in #1116, could you please check if this works as expected? Unfortunately working with timezones works best for the timezone one is living in. Something like this should install the development branch:

remotes::install_github("JanMarvin/openxlsx2#1116")
iago-pssjd commented 3 months ago

Hi, @JanMarvin Thanks for the answer and for the PR. I tested and, although it does not what I expected, it does what I asked for..., so it is what should be expected. I mean, it works as a way to detail tz to convert_datetimes, although I expected a different behaviour by this last function. I describe below what I expected, although this specific issue can be closed as it was solved what I was asking for.

  1. Specifying nothing or options("openxlsx2.date_tz" = "") I get "2023-05-16 17:11:55 CEST"
  2. Specifying options("openxlsx2.date_tz" = "UTC") I expected to get "2023-05-16 17:11:55 UTC" (same obtained from same database in another format and through another package)
  3. There would also be the possibility of returning the same hour in UTC time, so "2023-05-16 15:11:55 UTC"
  4. However, what I get is "2023-05-16 19:11:55 UTC"
  5. Stranger yet, what I get specifying my default time zone options("openxlsx2.date_tz" = "CEST") is "2023-05-16 19:11:55 CEST".

Thanks!

JanMarvin commented 3 months ago

:smile: I'm afraid I'm not all that familiar with timezones. The behavior you are seeing, is because R converts from one timezone to another. Though that is not permitted by the openxml standard. Iirc openxml requires all date/datetime variables to be UTC. Therefore, if I open it in Europe, Olivier in North-America, or somebody in Australia, we should all see the same date/datetime in spreadsheet software. And obviously in R we should all get back the same date. (If timezones were involved in spreadsheet software I doubt the world would still be spinning ... ).

Maybe there is a way to switch the timezones you are seeing in R, but I guess it would require something like this: