EdwinTh / padr

Padding of missing records in time series
https://edwinth.github.io/padr/
Other
132 stars 12 forks source link

Error: interval is not valid #61

Closed tungttnguyen closed 5 years ago

tungttnguyen commented 5 years ago

Hello,

I want to create a time series at 1 minute interval and fill unavailable time slot with 1. However the code returned Error: interval is not valid. Did I miss anything obvious? Thanks!


library(tidyverse)
library(padr)

df <- tibble(
  time  = c("09-11-2018 11:00:00", "09-11-2018 11:02:16",
            "09-11-2018 11:05:00", "09-11-2018 11:09:35",
            "09-11-2018 11:13:03", "09-11-2018 11:16:00"),
  value = c(3, 5, 2, 4, 6, 1))
df <- df %>% 
  mutate(time = parse_datetime(time, "%d-%m-%Y %H:%M:%S"))
df
#> # A tibble: 6 x 2
#>   time                value
#>   <dttm>              <dbl>
#> 1 2018-11-09 11:00:00     3
#> 2 2018-11-09 11:02:16     5
#> 3 2018-11-09 11:05:00     2
#> 4 2018-11-09 11:09:35     4
#> 5 2018-11-09 11:13:03     6
#> 6 2018-11-09 11:16:00     1

df %>% 
  thicken('1 minute')
#> Error: interval is not valid

df %>% 
  pad('minute', start_val = as.POSIXct('09-11-2018 11:00:00', tz = 'UTC'))
#> Error: interval is not valid

Created on 2018-11-09 by the reprex package (v0.2.1.9000)

EdwinTh commented 5 years ago

The interval in thicken is any character string that would be accepted by seq.Date or seq.POSIXt. (which it calls). The functions use the abbreviation "min", not the full word "minute". This works

df %>% 
  thicken('1 min')
tungttnguyen commented 5 years ago

Thanks @EdwinTh! However when I ran these lines I got the other error

df %>% pad('min', start_val = as.POSIXct('09-11-2018 11:00:00', tz = 'UTC'))
Error: Estimated 1056616516 returned rows, larger than 1 million in break_above

How could it be when there is only 16 minutes between the start and end time?

EdwinTh commented 5 years ago

Because POSIXct expects the data to be yyyy-mm-dd so your start_val started in 9 AD.

Note that you don't need to specify the start_val if it is equal to the first value in your time variable. Note further that you cannot pad this data frame on the minute interval, because the observations are not of the same whole minute (they differ on the second level).

tungttnguyen commented 5 years ago

Thanks @EdwinTh! I was hoping that padr can somehow deal with the irregular intervals. I'll have to stick with manual method then

EdwinTh commented 5 years ago

Yes you can, using pad_cust.

tungttnguyen commented 5 years ago

Thanks @EdwinTh! Can you post a solution for the example I posted earlier? I think it can be useful for other people too