christophsax / tempdisagg

Methods for Temporal Disaggregation and Interpolation of Time Series
http://cran.r-project.org/web/packages/tempdisagg
37 stars 5 forks source link

How to disaggregate quarterly series to weekly #38

Closed vikaskumar30 closed 4 years ago

vikaskumar30 commented 5 years ago

Hi, I am getting the following error when calling the td function in the below code Error: time column needs to specified as the first date of the period

rm(list=ls())
library(quantmod)
library(tempdisagg)
library(readxl)

# Qtrly series
data<-read_excel("qtrly_series.xlsx", sheet = 1, col_names = c("Date","Value"), col_types = c("date","numeric"))
data_cs_ts<-as.xts(data$Value,order.by=as.Date(data$Date))

# Weekly series
data<-read_excel("weekly_series.xlsx", sheet = 1, col_names = c("Date","Value"), col_types = c("date","numeric"))
data_30y_ts<-as.xts(data$Value,order.by=as.Date(data$Date))

# Interpolating qtrly to weekly series 
interp_fit=td(data_cs_ts~data_30y_ts,conversion="average",to=13,method="chow-lin-maxlog")
data_mir<-as.xts(predict(interp_fit),order.by=as.Date(index(data_30y_ts)))

qtrly_series.xlsx weekly_series.xlsx

vikaskumar30 commented 5 years ago

Hi @christophsax just bumping this again.

christophsax commented 5 years ago

If you use timestamp based time series within tempdisagg, the date column/index must to specified as the first day in the period. If you have quarterly data, it must be the first day of the quarter.

Weekly and quarterly data overlap each other. In order to make this work, you need to make your weekly series daily first, then use the daily series as an indicator for the quarterly series. Finally, aggregate the daily series back to week. Here is how it can be done:

library(tidyverse)
library(tsbox)
library(tempdisagg)

y_q <- readxl::read_excel("qtrly_series.xlsx", col_names = c("time", "value")) %>%
  mutate(time = as.Date(time)) %>%
  tsbox:::ts_first_of_period()
x_w <- readxl::read_excel("weekly_series.xlsx", col_names = c("time", "value")) %>%
  mutate(time = as.Date(time)) %>%
  tsbox:::ts_first_of_period()

Make your weekly series daily first. You can do this like that:

m_1 <- td(x_w ~ 1, to = "day", method = "fast")
x_d <- predict(m_1)
x_d
#> # A tibble: 10,836 x 2
#>    time       value
#>    <date>     <dbl>
#>  1 1990-01-01  1.39
#>  2 1990-01-02  1.39
#>  3 1990-01-03  1.39
#>  4 1990-01-04  1.39
#>  5 1990-01-05  1.39
#>  6 1990-01-06  1.40
#>  7 1990-01-07  1.40
#>  8 1990-01-08  1.40
#>  9 1990-01-09  1.40
#> 10 1990-01-10  1.40
#> # … with 10,826 more rows

fast means faster than the other methods, but week to day is still really slow. Be patient. In a second step, you can disaggregate your quarterly series to daily (which is faster):

m_2 <- td(y_q ~ x_d, method = "fast", conversion = "mean")
y_d <- predict(m_2)
y_d
#> # A tibble: 10,836 x 2
#>    time       value
#>    <date>     <dbl>
#>  1 1990-01-01  10.0
#>  2 1990-01-02  10.0
#>  3 1990-01-03  10.0
#>  4 1990-01-04  10.0
#>  5 1990-01-05  10.0
#>  6 1990-01-06  10.0
#>  7 1990-01-07  10.0
#>  8 1990-01-08  10.0
#>  9 1990-01-09  10.0
#> 10 1990-01-10  10.0
#> # … with 10,826 more rows

y_d can be aggregated to weekly again:

y_w <-
  y_d %>%
  group_by(
    time = as.Date(
      paste(lubridate::year(time), lubridate::week(time), 7),
      format = "%Y %U %u"
    ) - 6
  ) %>%
  summarize(value = mean(value)) %>%
  ungroup() %>%
  filter(!is.na(time))  # NA at the end

ts_plot(y_w, y_q)

Created on 2019-09-15 by the reprex package (v0.3.0)

christophsax commented 3 years ago

Update October 2021. With a few changes in tsbox and tempdisagg, the process of disaggregating quarterly (or monthly) data to weekly got slightly easier.

Here is a minimal example:

library(tidyverse)
library(tsbox)
library(tempdisagg)

data(tempdisagg)

a weekly indicator (optional)

x_w <- ts_tbl(ts_frequency(spi.d, to = "week", aggregate = "sum"))
x_w
#> # A tibble: 784 × 2
#>    time        value
#>    <date>      <dbl>
#>  1 2005-01-02 29914.
#>  2 2005-01-09 29797.
#>  3 2005-01-16 29987.
#>  4 2005-01-23 30036.
#>  5 2005-01-30 30322.
#>  6 2005-02-06 30779.
#>  7 2005-02-13 30909.
#>  8 2005-02-20 30959.
#>  9 2005-02-27 31364.
#> 10 2005-03-06 31480.
#> # … with 774 more rows

target series

y_q <- ts_tbl(gdp.q)
y_q
#> # A tibble: 59 × 2
#>    time         value
#>    <date>       <dbl>
#>  1 2005-01-01 133101.
#>  2 2005-04-01 136320.
#>  3 2005-07-01 137694.
#>  4 2005-10-01 139476.
#>  5 2006-01-01 139205.
#>  6 2006-04-01 141112.
#>  7 2006-07-01 142682.
#>  8 2006-10-01 145394.
#>  9 2007-01-01 144916.
#> 10 2007-04-01 147339.
#> # … with 49 more rows

Comparing scaled versions of the two series

ts_plot(ts_scale(ts_c(y_q, x_w)))

Step 0: Prepare Daily Indicator

(if you are using an indicator series):

Make your weekly series daily first (yes, we already have a daily series, but the example demonstrates how to deal with weekly indicators). You can do this like that:

m_1 <- td(x_w ~ 1, to = "day", method = "fast", conversion = "sum")
x_d <- predict(m_1)
x_d
#> # A tibble: 5,488 × 2
#>    time       value
#>    <date>     <dbl>
#>  1 2005-01-02 4280.
#>  2 2005-01-03 4279.
#>  3 2005-01-04 4277.
#>  4 2005-01-05 4275.
#>  5 2005-01-06 4272.
#>  6 2005-01-07 4268.
#>  7 2005-01-08 4263.
#>  8 2005-01-09 4258.
#>  9 2005-01-10 4255.
#> 10 2005-01-11 4253.
#> # … with 5,478 more rows

fast means faster than the other methods, but week to day is still really slow. Be patient.

Step 1: Disaggregate to Daily

In a second step, you can disaggregate your quarterly series to daily (which is faster):

m_2 <- td(y_q ~ x_d, method = "fast", conversion = "sum")
#> High frequency series shorter than low frequency. Discarding low frequency before 2005-04-01.
y_d <- predict(m_2)
y_d
#> # A tibble: 5,488 × 2
#>    time       value
#>    <date>     <dbl>
#>  1 2005-01-02 1495.
#>  2 2005-01-03 1495.
#>  3 2005-01-04 1495.
#>  4 2005-01-05 1495.
#>  5 2005-01-06 1495.
#>  6 2005-01-07 1495.
#>  7 2005-01-08 1495.
#>  8 2005-01-09 1495.
#>  9 2005-01-10 1495.
#> 10 2005-01-11 1495.
#> # … with 5,478 more rows

If you are not using an indicator series, the command is:

m_3 <- td(y_q ~ 1, method = "fast", conversion = "sum")

The temporal disaggregtion method ensures that the quarterly values of the daily series are the same as those of the target series:

ts_frequency(y_d, to = "quarter", aggregate = "sum")
#> # A tibble: 59 × 2
#>    time         value
#>    <date>       <dbl>
#>  1 2005-04-01 136320.
#>  2 2005-07-01 137694.
#>  3 2005-10-01 139476.
#>  4 2006-01-01 139205.
#>  5 2006-04-01 141112.
#>  6 2006-07-01 142682.
#>  7 2006-10-01 145394.
#>  8 2007-01-01 144916.
#>  9 2007-04-01 147339.
#> 10 2007-07-01 148992.
#> # … with 49 more rows

Step 2: Aggregate to Weekly

y_d can be aggregated to weekly again:

y_w <- ts_frequency(y_d, to = "week", aggregate = "sum")

Comparing a scaled version of the result to the target series:

ts_plot(ts_scale(ts_c(y_q, y_d)))

Due to the overlap of the weeks, the quarterly values are not identical to those of the target series:

ts_frequency(y_w, to = "quarter", aggregate = "sum")
#> # A tibble: 60 × 2
#>    time         value
#>    <date>       <dbl>
#>  1 2005-01-01 136246.
#>  2 2005-04-01 136313.
#>  3 2005-07-01 136206.
#>  4 2005-10-01 137973.
#>  5 2006-01-01 140760.
#>  6 2006-04-01 141103.
#>  7 2006-07-01 141137.
#>  8 2006-10-01 154990.
#>  9 2007-01-01 135321.
#> 10 2007-04-01 147339.
#> # … with 50 more rows

Created on 2021-10-18 by the reprex package (v2.0.0)