Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 985 forks source link

Faster second, minute and hour for objects of class ITime #3158

Closed arunsrinivasan closed 5 years ago

arunsrinivasan commented 5 years ago

The functions second, minute and hour currently have a faster method, by bypassing as.POSIXlt(), for POSIXct objectss under UTC. See here.

The faster logic needs to be also included for ITime objects. The fix is quite simple. The if statement in all these functions need an additional || inherits(x, "ITime")

Here's a quick benchmark:

# Current version
second  <- function(x) {
  if (inherits(x,'POSIXct') && identical(attr(x,'tzone'),'UTC')) {
    # if we know the object is in UTC, can calculate the hour much faster
    as.integer(x) %% 60L
  } else {
    as.integer(as.POSIXlt(x)$sec)
  }
}

# new version
fSecond<- function(x) {
  if (inherits(x, "ITime") || (inherits(x,'POSIXct') && identical(attr(x,'tzone'),'UTC'))) {
    # if we know the object is in UTC, can calculate the hour much faster
    as.integer(x) %% 60L
  } else {
    as.integer(as.POSIXlt(x)$sec)
  }
}

require(data.table)
x <- setattr(0:86399, "class", "ITime")
y <- sample(x, 50e7, TRUE)

system.time(a1 <- second(y))
#    user  system elapsed 
#   29.00   10.72   39.78 
system.time(a2 <- fSecond(y))
#    user  system elapsed 
#    3.51    0.56    4.06 

identical(a1, a2)
# [1] TRUE
MichaelChirico commented 5 years ago

maybe time to just make these time helper functions S3 generic?

On Thu, Nov 22, 2018, 3:38 AM Arun Srinivasan <notifications@github.com wrote:

The functions second, minute and hour currently have a faster method, by bypassing as.POSIXlt(), for POSIXct objectss under UTC. See here https://github.com/Rdatatable/data.table/blob/master/R/IDateTime.R#L275%23L298 .

The faster logic needs to be also included for ITime objects. The fix is quite simple. The if statement in all these functions need an additional || inherits(x, "ITime")

Here's a quick benchmark:

Current versionsecond <- function(x) {

if (inherits(x,'POSIXct') && identical(attr(x,'tzone'),'UTC')) {

if we know the object is in UTC, can calculate the hour much faster

as.integer(x) %% 60L

} else { as.integer(as.POSIXlt(x)$sec) } }

new versionfSecond<- function(x) {

if (inherits(x, "ITime") || (inherits(x,'POSIXct') && identical(attr(x,'tzone'),'UTC'))) {

if we know the object is in UTC, can calculate the hour much faster

as.integer(x) %% 60L

} else { as.integer(as.POSIXlt(x)$sec) } }

require(data.table)x <- setattr(0:86399, "class", "ITime")y <- sample(x, 50e7, TRUE)

system.time(a1 <- second(y))# user system elapsed # 29.00 10.72 39.78 system.time(a2 <- fSecond(y))# user system elapsed # 3.51 0.56 4.06

identical(a1, a2)# [1] TRUE

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rdatatable/data.table/issues/3158, or mute the thread https://github.com/notifications/unsubscribe-auth/AHQQdbHa7TfSlcjynBexbEK-GmSioOBtks5uxavAgaJpZM4Ytxis .

arunsrinivasan commented 5 years ago

I agree it'll make things neat, but there are no other classes to consider here AFAICT. I think just an additional if-clause would be sufficient for now therefore.

jangorecki commented 5 years ago

There is Dirks nanotime which we already have in suggests. Having if-elses way will be good enough and easy to turn into S3 anyway.