Closed vikaskumar30 closed 4 years ago
Hi @christophsax just bumping this again.
If you use timestamp based time series within tempdisagg, the date column/index must to specified as the first day in the period. If you have quarterly data, it must be the first day of the quarter.
Weekly and quarterly data overlap each other. In order to make this work, you need to make your weekly series daily first, then use the daily series as an indicator for the quarterly series. Finally, aggregate the daily series back to week. Here is how it can be done:
library(tidyverse)
library(tsbox)
library(tempdisagg)
y_q <- readxl::read_excel("qtrly_series.xlsx", col_names = c("time", "value")) %>%
mutate(time = as.Date(time)) %>%
tsbox:::ts_first_of_period()
x_w <- readxl::read_excel("weekly_series.xlsx", col_names = c("time", "value")) %>%
mutate(time = as.Date(time)) %>%
tsbox:::ts_first_of_period()
Make your weekly series daily first. You can do this like that:
m_1 <- td(x_w ~ 1, to = "day", method = "fast")
x_d <- predict(m_1)
x_d
#> # A tibble: 10,836 x 2
#> time value
#> <date> <dbl>
#> 1 1990-01-01 1.39
#> 2 1990-01-02 1.39
#> 3 1990-01-03 1.39
#> 4 1990-01-04 1.39
#> 5 1990-01-05 1.39
#> 6 1990-01-06 1.40
#> 7 1990-01-07 1.40
#> 8 1990-01-08 1.40
#> 9 1990-01-09 1.40
#> 10 1990-01-10 1.40
#> # … with 10,826 more rows
fast
means faster than the other methods, but week to day is still really
slow. Be patient.
In a second step, you can disaggregate your quarterly series to daily
(which is faster):
m_2 <- td(y_q ~ x_d, method = "fast", conversion = "mean")
y_d <- predict(m_2)
y_d
#> # A tibble: 10,836 x 2
#> time value
#> <date> <dbl>
#> 1 1990-01-01 10.0
#> 2 1990-01-02 10.0
#> 3 1990-01-03 10.0
#> 4 1990-01-04 10.0
#> 5 1990-01-05 10.0
#> 6 1990-01-06 10.0
#> 7 1990-01-07 10.0
#> 8 1990-01-08 10.0
#> 9 1990-01-09 10.0
#> 10 1990-01-10 10.0
#> # … with 10,826 more rows
y_d
can be aggregated to weekly again:
y_w <-
y_d %>%
group_by(
time = as.Date(
paste(lubridate::year(time), lubridate::week(time), 7),
format = "%Y %U %u"
) - 6
) %>%
summarize(value = mean(value)) %>%
ungroup() %>%
filter(!is.na(time)) # NA at the end
ts_plot(y_w, y_q)
Created on 2019-09-15 by the reprex package (v0.3.0)
Update October 2021. With a few changes in tsbox and tempdisagg, the process of disaggregating quarterly (or monthly) data to weekly got slightly easier.
Here is a minimal example:
library(tidyverse)
library(tsbox)
library(tempdisagg)
data(tempdisagg)
a weekly indicator (optional)
x_w <- ts_tbl(ts_frequency(spi.d, to = "week", aggregate = "sum"))
x_w
#> # A tibble: 784 × 2
#> time value
#> <date> <dbl>
#> 1 2005-01-02 29914.
#> 2 2005-01-09 29797.
#> 3 2005-01-16 29987.
#> 4 2005-01-23 30036.
#> 5 2005-01-30 30322.
#> 6 2005-02-06 30779.
#> 7 2005-02-13 30909.
#> 8 2005-02-20 30959.
#> 9 2005-02-27 31364.
#> 10 2005-03-06 31480.
#> # … with 774 more rows
target series
y_q <- ts_tbl(gdp.q)
y_q
#> # A tibble: 59 × 2
#> time value
#> <date> <dbl>
#> 1 2005-01-01 133101.
#> 2 2005-04-01 136320.
#> 3 2005-07-01 137694.
#> 4 2005-10-01 139476.
#> 5 2006-01-01 139205.
#> 6 2006-04-01 141112.
#> 7 2006-07-01 142682.
#> 8 2006-10-01 145394.
#> 9 2007-01-01 144916.
#> 10 2007-04-01 147339.
#> # … with 49 more rows
Comparing scaled versions of the two series
ts_plot(ts_scale(ts_c(y_q, x_w)))
(if you are using an indicator series):
Make your weekly series daily first (yes, we already have a daily series, but the example demonstrates how to deal with weekly indicators). You can do this like that:
m_1 <- td(x_w ~ 1, to = "day", method = "fast", conversion = "sum")
x_d <- predict(m_1)
x_d
#> # A tibble: 5,488 × 2
#> time value
#> <date> <dbl>
#> 1 2005-01-02 4280.
#> 2 2005-01-03 4279.
#> 3 2005-01-04 4277.
#> 4 2005-01-05 4275.
#> 5 2005-01-06 4272.
#> 6 2005-01-07 4268.
#> 7 2005-01-08 4263.
#> 8 2005-01-09 4258.
#> 9 2005-01-10 4255.
#> 10 2005-01-11 4253.
#> # … with 5,478 more rows
fast
means faster than the other methods, but week to day is still really
slow. Be patient.
In a second step, you can disaggregate your quarterly series to daily (which is faster):
m_2 <- td(y_q ~ x_d, method = "fast", conversion = "sum")
#> High frequency series shorter than low frequency. Discarding low frequency before 2005-04-01.
y_d <- predict(m_2)
y_d
#> # A tibble: 5,488 × 2
#> time value
#> <date> <dbl>
#> 1 2005-01-02 1495.
#> 2 2005-01-03 1495.
#> 3 2005-01-04 1495.
#> 4 2005-01-05 1495.
#> 5 2005-01-06 1495.
#> 6 2005-01-07 1495.
#> 7 2005-01-08 1495.
#> 8 2005-01-09 1495.
#> 9 2005-01-10 1495.
#> 10 2005-01-11 1495.
#> # … with 5,478 more rows
If you are not using an indicator series, the command is:
m_3 <- td(y_q ~ 1, method = "fast", conversion = "sum")
The temporal disaggregtion method ensures that the quarterly values of the daily series are the same as those of the target series:
ts_frequency(y_d, to = "quarter", aggregate = "sum")
#> # A tibble: 59 × 2
#> time value
#> <date> <dbl>
#> 1 2005-04-01 136320.
#> 2 2005-07-01 137694.
#> 3 2005-10-01 139476.
#> 4 2006-01-01 139205.
#> 5 2006-04-01 141112.
#> 6 2006-07-01 142682.
#> 7 2006-10-01 145394.
#> 8 2007-01-01 144916.
#> 9 2007-04-01 147339.
#> 10 2007-07-01 148992.
#> # … with 49 more rows
y_d
can be aggregated to weekly again:
y_w <- ts_frequency(y_d, to = "week", aggregate = "sum")
Comparing a scaled version of the result to the target series:
ts_plot(ts_scale(ts_c(y_q, y_d)))
Due to the overlap of the weeks, the quarterly values are not identical to those of the target series:
ts_frequency(y_w, to = "quarter", aggregate = "sum")
#> # A tibble: 60 × 2
#> time value
#> <date> <dbl>
#> 1 2005-01-01 136246.
#> 2 2005-04-01 136313.
#> 3 2005-07-01 136206.
#> 4 2005-10-01 137973.
#> 5 2006-01-01 140760.
#> 6 2006-04-01 141103.
#> 7 2006-07-01 141137.
#> 8 2006-10-01 154990.
#> 9 2007-01-01 135321.
#> 10 2007-04-01 147339.
#> # … with 50 more rows
Created on 2021-10-18 by the reprex package (v2.0.0)
Hi, I am getting the following error when calling the td function in the below code Error: time column needs to specified as the first date of the period
qtrly_series.xlsx weekly_series.xlsx