google / CausalImpact

An R package for causal inference in time series
Apache License 2.0
1.68k stars 251 forks source link

Daily time series with zeros #11

Open lmkirvan opened 7 years ago

lmkirvan commented 7 years ago

I have an issue that I'm having trouble debugging, I'm not convinced its a bug, but thought that I should post it here anyway. I have a fair number (10+) of daily time series that I'm using as control series. Some of the time-series contain 0 entries when daily. When I run the model using daily time series I get the following error message:

> impact <- CausalImpact(
+   final_series, 
+   pre.period = first_last(pre_index),
+   post.period = first_last(post_index),
+   model.args = list(nseasons = 7),
+   alpha = .2
+ )
Error in data.frame(y.model, cum.y.model, point.pred, cum.pred) : 
  arguments imply differing number of rows: 979, 1459

I've tried dropping out control time series one by one without luck.

When I convert the same time series to weekly, most (but not all) of the zero entries are removed and the model runs without any problems.

I'm at a bit of a loss as to why this error is occurring. I'm using the latest version of the package.

> packageVersion("CausalImpact")
[1] ‘1.2.1’
jbao commented 7 years ago

Do you happen to have a reproducible example?

lmkirvan commented 7 years ago

I'll try to post one tomorrow. Thanks!

jbao commented 6 years ago

Actually I encountered the same problem as @lmkirvan , see the reproducible example below

library(zoo)
library(CausalImpact)

ts <- zoo(c(5L, 5L, 2L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 0L, 0L, 0L, 0L, 2L, 
  0L, 2L, 3L, 2L, 5L, 4L, 7L, 6L, 4L, 9L, 6L, 2L, 3L, 3L, 0L, 5L, 
  1L, 4L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 4L, 6L, 1L, 3L, 
  5L, 2L, 3L, 1L, 4L, 2L, 2L, 2L, 0L, 1L, 2L, 0L, 1L, 1L, 0L, 0L, 
  1L, 1L, 1L, 1L, 3L, 2L, 0L, 1L, 2L, 0L, 0L, 1L, 3L, 3L, 0L, 2L, 
  0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L, 
  1L, 0L, 2L, 1L, 0L, 2L, 2L, 6L, 1L, 5L, 1L, 0L, 0L, 1L, 0L, 0L, 
  4L, 0L, 4L, 7L, 4L, 6L, 9L, 4L, 7L, 3L, 6L),
  structure(c(1505304000, 1505307600, 1505311200, 1505314800, 1505318400, 
          1505322000, 1505325600, 1505329200, 1505332800, 1505336400, 1505340000, 
          1505343600, 1505347200, 1505350800, 1505354400, 1505358000, 1505361600, 
          1505365200, 1505368800, 1505372400, 1505376000, 1505379600, 1505383200, 
          1505386800, 1505390400, 1505394000, 1505397600, 1505401200, 1505404800, 
          1505408400, 1505412000, 1505415600, 1505419200, 1505422800, 1505426400, 
          1505430000, 1505433600, 1505437200, 1505440800, 1505444400, 1505448000, 
          1505451600, 1505455200, 1505458800, 1505462400, 1505466000, 1505469600, 
          1505473200, 1505476800, 1505480400, 1505484000, 1505487600, 1505491200, 
          1505494800, 1505498400, 1505502000, 1505505600, 1505512800, 1505516400, 
          1505520000, 1505523600, 1505527200, 1505530800, 1505534400, 1505538000, 
          1505541600, 1505545200, 1505548800, 1505552400, 1505556000, 1505559600, 
          1505563200, 1505566800, 1505570400, 1505574000, 1505577600, 1505581200, 
          1505584800, 1505588400, 1505592000, 1505595600, 1505599200, 1505602800, 
          1505606400, 1505610000, 1505613600, 1505617200, 1505620800, 1505624400, 
          1505628000, 1505631600, 1505635200, 1505638800, 1505642400, 1505646000, 
          1505649600, 1505653200, 1505656800, 1505660400, 1505664000, 1505667600, 
          1505671200, 1505674800, 1505678400, 1505682000, 1505685600, 1505689200, 
          1505692800, 1505696400, 1505700000, 1505703600, 1505707200, 1505710800, 
          1505714400, 1505718000, 1505721600, 1505725200, 1505728800, 1505732400, 
          1505736000, 1505739600, 1505743200), class = c("POSIXct", "POSIXt"
          ), tzone = "GMT")
)

pre.period <- structure(c(1505304000, 1505732400), class = c("POSIXct", "POSIXt"))
post.period <- structure(c(1505736000, 1505743200), class = c("POSIXct", "POSIXt"))
impact <- CausalImpact(ts, pre.period, post.period, model.args = list(niter = 5000, nseasons = 24))

I'm not really sure if the issue is caused by the 0s, though.

lmkirvan commented 6 years ago

Thanks for posting, I've been meaning to get around to recreating the error.

alhauser commented 6 years ago

Thanks for sharing the reproducible example.

The problem is caused by the fact that the times are irregular:

diff(time(ts))

shows the granularity is mostly hourly, but there is one 2-hour gap.

No fix ready yet (working on it), but as a workaround, you can regularize the time series before fitting the model:

times <- seq(start(ts), end(ts), by = "hour")
ts.regularized <- merge(ts, zoo(, times), all = TRUE)
diff(time(ts.regularized))

Now that the granularity is always 1 hour (having value NA at the time point that was missing before), the model fit does not fail anymore:

impact <- CausalImpact(ts.regularized, pre.period, post.period, 
                       model.args = list(niter = 5000, nseasons = 24))
jbao commented 6 years ago

Hi @alhauser , thanks, that's indeed the case!

chandanshikhar1 commented 5 years ago

Thanks for the advice. Works for me 2.

stumakha commented 4 years ago

Thank you, does not having NA in time series after you add back missing dates cause CausalImpact NA check to error out?

"Error: !anyNA(data[, -1]) is not TRUE"

I did try ts.regularized[is.na(ts.regularized)] <- 0

(having value NA at the time point that was missing before),

stumakha commented 4 years ago

na.fill worked, but I am surprised it worked for the rest without backfilling NAs

ts.regularized <- na.fill(ts.regularized, fill = 0)

nba2020 commented 4 years ago

Not sure if this is still an open issue, but when I faced similar challened I created a blank data frame with all the days I wanted to study: date_range<-as.data.frame(seq(as.Date('2016-07-01'),as.Date(Today-1),by = 1)) Then I merge with left join like below and filled the NAs with 0s. Data<-merge(x = date_range, y = Data, by = "date", all.x = TRUE) Then causalimpact ran smoothly with daily 0 observations not being an issue when appeared.