Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.58k stars 976 forks source link

allow subsetting by times #607

Open arunsrinivasan opened 10 years ago

arunsrinivasan commented 10 years ago

Submitted by: Garrett See; Assigned to: Nobody; R-Forge link

The xts package has a last.default method (and a first.default method) that allows for natural language subsetting of dates/times.

e.g.

x <- Sys.time()-1:1000
last(x, '5 minutes')
last(x, '20 seconds')
first(last(x, '5 minutes'), '20 seconds') # first 20 seconds of last 5 minutes

That can be used in the i argument with data.table, but it is not as straightforward/efficient as it might be because it requires converting to either numeric or character and then performing a vector scan

> DT <- data.table(Time=Sys.time()-1:1000, Val=rnorm(1000))
> DT
>                            Time        Val
>    1: 2012-11-13 08:00:27.87488 -0.9738186
>    2: 2012-11-13 08:00:26.87488 -0.0996312
>    3: 2012-11-13 08:00:25.87488 -0.1107350
>    4: 2012-11-13 08:00:24.87488  1.1921946
>    5: 2012-11-13 08:00:23.87488 -1.6558859
> ---
> 
>  996: 2012-11-13 07:43:52.87488  1.7064713
>  997: 2012-11-13 07:43:51.87488 -1.5212433
>  998: 2012-11-13 07:43:50.87488  1.1469169
>  999: 2012-11-13 07:43:49.87488  0.2344396
> 1000: 2012-11-13 07:43:48.87488  0.5432188
> 
> DT[as.numeric(Time) %in% as.numeric(last(Time, '5 minutes'))]
>                           Time        Val
>   1: 2012-11-13 07:56:00.87488  1.0079729
>   2: 2012-11-13 07:56:01.87488  0.4194030
>   3: 2012-11-13 07:56:02.87488  0.5053080
>   4: 2012-11-13 07:56:03.87488 -1.0772980
>   5: 2012-11-13 07:56:04.87488  2.9398212
> ---
> 
> 264: 2012-11-13 08:00:23.87488 -1.6558859
> 265: 2012-11-13 08:00:24.87488  1.1921946
> 266: 2012-11-13 08:00:25.87488 -0.1107350
> 267: 2012-11-13 08:00:26.87488 -0.0996312
> 268: 2012-11-13 08:00:27.87488 -0.9738186
# DT[as.character(last(Time, '5 minutes')) %in% as.character(Time)] # same

Ideally, this would work

DT[last(Time, '5 minutes')]

But, currently it gives:

    Error in `[.data.table`(DT, last(Time, "5 minutes"), ) : 
      i has not evaluated to logical, integer or double

Note that calling xts::last on a timeBased vector (i.e. Date, POSIXct, POSIXlt, etc.) returns a vector of the same class

> tail(last(DT[, Time], '5 minutes'))
> [1] "2012-11-13 08:00:22.87488 CST" "2012-11-13 08:00:23.87488 CST"
> [3] "2012-11-13 08:00:24.87488 CST" "2012-11-13 08:00:25.87488 CST"
> [5] "2012-11-13 08:00:26.87488 CST" "2012-11-13 08:00:27.87488 CST"
jangorecki commented 5 years ago

related #3241