bfast2 / bfast

Breaks For Additive Season and Trend
https://bfast2.github.io
GNU General Public License v2.0
41 stars 15 forks source link

problems with NAs in regressor of bfaston #62

Closed LILIXU-CCNU closed 4 years ago

LILIXU-CCNU commented 4 years ago

here is the reprex.html of this problem: http://localhost:31875/session/reprex63e869fb6762/reprex_reprex.html

LILIXU-CCNU commented 4 years ago
devtools::install_github("bfast2/strucchange@rsquared-summary")
#> Skipping install of 'strucchange' from a github remote, the SHA1 (6b5ce5c7) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(strucchange)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: sandwich
library(bfast)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
datats <- ts(rowSums(simts$time.series))
tsp(datats) <- tsp(simts$time.series) # assign correct time series attributes
#al <- datats
#test set NAs  #=====================regressor with NAs
regressor <- seq(from=1, to=199)
#regressor[10]<- 0
regressor[2:50] <- NA
#ccts <- ts(al,start = c(2015,7),frequency = 12)
bpp <- bfastpp(datats,order = 2)
bf<- breakpoints(response ~ (trend+ season+ as.numeric(regressor)), data=bpp, h=0.15, breaks= "LWZ")
#> Error in breakpoints.matrix(X, y, h, breaks, hpc, ...): minimum segment size must be greater than the number of regressors
output <- breakpoints(bf, breaks="LWZ")
#> Error in breakpoints(bf, breaks = "LWZ"): object 'bf' not found
print(output)
#> Error in print(output): object 'output' not found
#===========================================
#test no NAs  #=====================regressor without NAs
regressor <- seq(from=1, to=199)
#regressor[20]<- 0
#ccts <- ts(al,start = c(2015,7),frequency = 12)
bpp <- bfastpp(datats,order = 2)
bf<- breakpoints(response ~ (trend+ season+ as.numeric(regressor)), data=bpp, h=0.15, breaks= "LWZ")
output <- breakpoints(bf, breaks="LWZ")
print(output)
#> 
#>   Optimal 2-segment partition: 
#> 
#> Call:
#> breakpoints.breakpointsfull(obj = bf, breaks = "LWZ")
#> 
#> Breakpoints at observation number:
#> 99 
#> 
#> Corresponding to breakdates:
#> 0.4974874
GreatEmerald commented 4 years ago

The problem here was that season adds a lot of regressors to the model, and the h parameter is quite low to trigger the error. So this is expected behaviour.

In addition, external regressors should be added to the original time series using cbind(), and then bfastpp outputs them as xreg columns in the output, thus the formula would be response~trend+season+xreg. This should be documented better somewhere, probably in ?bfastpp.

A related error one may get while trying something like the above is:

Error in model.frame.default(formula, data = data) : 
  variable lengths differ (found for 'as.numeric(regressor)')

Which results in the fact that NAs are not removed from the regressor, and thus column widths are mismatched. Thus external regressors should always go through bfastpp that makes sure to handle that.