christophsax / tempdisagg

Methods for Temporal Disaggregation and Interpolation of Time Series
http://cran.r-project.org/web/packages/tempdisagg
37 stars 5 forks source link

faster denton? #14

Closed christophsax closed 5 years ago

christophsax commented 10 years ago

denton (and denton-cholette, uniform) are very slow. Some ideas:

christophsax commented 7 years ago

Reviewed the Denton code but there seems no easy way to make it faster.

An interesting observation is that the Fernandez method, used with a constant, leads to exactly the same results as Denton-Cholette:

library(tempdisagg)
data(swisspharma)

a <- predict(td(sales.a ~ 1, to = "quarterly", method = "fernandez"))
b <- predict(td(sales.a ~ 1, to = "quarterly", method = "denton-cholette"))
all.equal(a, b)
# [1] TRUE

However, it turns out that Fernandez is not much faster than Denton-Cholette. I did some speed tests with the new irregular data feature #30, where we want to disaggregate monthly series to daily, which involves much more data:

system.time(m0 <- td(y ~ 1, lf = lf, lf.end = "2016-12-31", hf = hf, method = "denton-cholette"))
   user  system elapsed 
  8.288   0.052   8.348 

system.time(m1 <- td(y ~ 1, lf = lf, lf.end = "2016-12-31", hf = hf, method = "fernandez"))
   user  system elapsed 
  6.552   0.040   6.595 

all.equal(predict(m0), predict(m1))
[1] TRUE

However, since Fernandez can be approximated by the Chow-Lin method, with a fixed rho slightly below 1, we get a factor 50 speed bump:

system.time(m2 <- td(y ~ 1, lf = lf, lf.end = "2016-12-31", hf = hf, method = "chow-lin-fixed", fixed.rho = 0.99999))
   user  system elapsed 
  0.160   0.003   0.164 

all.equal(predict(m0), predict(m2))
[1] "Mean relative difference: 1.711644e-05"

Results are not identical, but very close. As soon as speed matters (and it will for long time series), this last method is probably the most useful.

Perhaps we should simply an example to the documentation that Chow-Lin ist great for univariate disaggregation and close this longstanding issue. @petersteiner any thoughts?

christophsax commented 7 years ago

Added a new method, "fast", a shortcut for chow-lin-fixed with fixed.rho = 0.99999.

library(tempdisagg)

# --- A new method: 'fast' -----------------------------------------------------

data(swisspharma)

# the new 'fast' method (method = "chowlin-fixed", fixed-rho = 0.999)
mod3 <- td(sales.a ~ 1, to = "quarterly", method = "denton-cholette")
mod4 <- td(sales.a ~ 1, to = "quarterly", method = "fast")

# not identical, but very similar
all.equal(predict(mod4), predict(mod3), tolerance = 1e-06)

library(microbenchmark)
microbenchmark(
  td(sales.a ~ 1, to = "quarterly", method = "denton-cholette"),
  td(sales.a ~ 1, to = "quarterly", method = "fast")
)

# Unit: milliseconds
#                                                           expr       min
#  td(sales.a ~ 1, to = "quarterly", method = "denton-cholette") 20.027719
#             td(sales.a ~ 1, to = "quarterly", method = "fast")  3.128817
#         lq      mean    median        uq      max neval
#  21.584639 22.925591 22.335442 23.458105 37.96290   100
#   3.335624  3.707853  3.453258  3.637311 17.55677   100
# > 
christophsax commented 5 years ago

New method, fast is now on master. Particularly for the new high frequency disaggregations, such as to daily.