Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

Bug in as.ITime.POSIXct #4085

Closed arunsrinivasan closed 4 years ago

arunsrinivasan commented 4 years ago

I haven't tested on devel but looking at as.ITime.POSIXct, it's identical to the one I've on my system.

require(data.table) # v1.12.2, Rv 3.6.1
Sys.timezone()
# [1] "America/New_York"
x <- Sys.time()
x
# [1] "2019-12-02 12:24:41 EST"
as.ITime(x)
# [1] "17:24:41"

This is incorrect because tz is assumed to be UTC if attr(x, 'tzone') returns NULL which is the case for Sys.time().

> data.table:::as.ITime.POSIXct
function (x, tz = attr(x, "tzone"), ...) 
{
    if (is.null(tz)) 
        tz = "UTC"
    if (tz %chin% c("UTC", "GMT")) 
        as.ITime(unclass(x), ...)
    else as.ITime(as.POSIXlt(x, tz = tz, ...), ...)
}

Why we need tz info to extract HH:MM:SS from a timestamp if I already provide a POSIXct object?

I upgraded from 1.10.4-3 which seems to not have this POSIXct method and delegates to .default, which handles it with as.ITime(as.POSIXlt(x, ...)). Looking at the POSIXct method, it seems similar to old default but with a special case for GMT/UTC. Why so?

MichaelChirico commented 4 years ago

the numeric method (hence unclass), which assumes UTC time, is much more efficient than POSIXlt conversion

the is.null branch does look error prone because of the default tz issue

MichaelChirico commented 4 years ago

@jangorecki added that here:

https://github.com/Rdatatable/data.table/pull/1393

was there a reason for defaulting NULL tz to UTC?

jangorecki commented 4 years ago

@MichaelChirico https://github.com/Rdatatable/data.table/pull/1393/files#diff-913eafb72dfdd454126aa591b02b1b2dR108-R110 doesn't that mean that it should work only for UTC and GMT classes and fall back to old process for other cases, like NULL tz?

MichaelChirico commented 4 years ago

I don't follow well because I'm on mobile but yes I think the POSIXlt route is appropriate for NULL tz. unless we have a reason to keep it I would just delete the is.null branch and all should be fine

jangorecki commented 4 years ago

Yes, check for is.null to replace for UTC was not there in my PR. It has been added in https://github.com/Rdatatable/data.table/commit/347b2cd0b7076c93d0b9956a03650beca73cc4e7 I believe for some reason. It was added for IDate class too. @mattdowle

arunsrinivasan commented 4 years ago

Ah okay, thank you. Seems like an oversight.. I think tz="" would be a better replacement if NULL.

MichaelChirico commented 4 years ago

Hmm, strangely base::as.Date defaults to tz='UTC', perhaps we did for consistency to that?

as.Date.POSIXct
function (x, tz = "UTC", ...) 
{
    if (tz == "UTC") {
        z <- floor(unclass(x)/86400)
        attr(z, "tzone") <- NULL
        .Date(z)
    }
    else as.Date(as.POSIXlt(x, tz = tz))
}
<bytecode: 0x7f8385f97c60>
<environment: namespace:base>
arunsrinivasan commented 4 years ago

That's a good point. Perhaps that's the reason... Hm, now I'm not so sure.

MichaelChirico commented 4 years ago

I see this in r-devel from @joshuaulrich:

https://stat.ethz.ch/pipermail/r-devel/2013-March/066221.html

Would it make sense for as.Date.POSIXct to not assume tz="UTC" if the
POSIXct object has a valid tzone attribute?  Current behavior may be
confusing in certain cases, for example:

> (d <- structure(1090450800, tzone="Europe/Berlin",
+ class=c("POSIXct","POSIXt")))
[1] "2004-07-22 01:00:00 CEST"
> as.Date(d)
[1] "2004-07-21"
> as.Date(as.POSIXlt(d))
[1] "2004-07-22"
> as.Date(d, tz=attr(d,'tzone'))
[1] "2004-07-22"

Seems nothing changed since then however...

The more I look at it the more I think we should break from base R behavior which feels wrong

MichaelChirico commented 4 years ago

Also apparently no response here to Josh 😿

https://stat.ethz.ch/pipermail/r-devel/2018-October/077005.html

I would like to propose a .Internal() POSIXct2Date() function, similar
to do_POSIXlt2D() in src/main/datetime.c.  I created a working
prototype, which is now included in Dirk Eddelbuettel's RApiDatetime
package:
joshuaulrich commented 4 years ago

Kurt Hornik responded privately to my question about my POSIXct2Date() proposal. He was open to the change, but did not have capacity to take it on at the time. Please email me if you're interested in helping me push this forward.