Add outlier exercises - Githubissues

andreranza commented 10 months ago

WIP

andreranza commented 10 months ago

Something to build upon as an additional question in this Exercise section.

library(seasonal)
set.seed(2024)

n_months <- 12
tot_len <- n_months * 5
ev_const <- 1.40 # tweak me

# Creating a time series with monthly frequency
dates <- seq(as.Date("2018-01-01"), by = "month", length.out = tot_len)
data <- rnorm(tot_len, mean = 8, sd = 1)

seasonal_component <- sin(2 * pi * (1:tot_len) / n_months)

# Introducing an outlier in the first time series
data_with_outlier <- data
data_with_outlier <- data_with_outlier + seasonal_component
data_with_outlier[49] <- data_with_outlier[49] + 10 
ts_with_outlier <- ts(data_with_outlier, start = c(2018, 1), frequency = n_months)

# Introducing extreme values in the second time series
data_with_extreme_values <- data
data_with_extreme_values <- data_with_extreme_values + seasonal_component
data_with_extreme_values[49] <- data_with_extreme_values[49] * ev_const 

# Creating second ts object
ts_with_extreme_values <- ts(
  data_with_extreme_values,
  start = c(2018, 1),
  frequency = n_months
)

# Plotting
par(mfrow = c(2, 1))

# Plot for time series with an outlier
plot(
  ts_with_outlier,
  main = "Time Series with an Outlier in 2022",
  ylab = "Value",
  xlab = "Date",
  col = "blue"
)

# Plot for time series with extreme value
plot(
  ts_with_extreme_values,
  main = "Time Series with EV in 2022",
  ylab = "Value",
  xlab = "Date",
  col = "green"
)


m_out <- seas(ts_with_outlier)
summary(m_out)
#> 
#> Call:
#> seas(x = ts_with_outlier)
#> 
#> Coefficients:
#>                   Estimate Std. Error z value Pr(>|z|)    
#> Constant           2.05258    0.02983  68.814  < 2e-16 ***
#> AO2022.Jan         0.78994    0.15053   5.248 1.54e-07 ***
#> AR-Nonseasonal-01  0.43859    0.31377   1.398    0.162    
#> MA-Nonseasonal-01  0.16216    0.34496   0.470    0.638    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> SEATS adj.  ARIMA: (1 0 1)  Obs.: 60  Transform: log
#> AICc:   206, BIC: 215.4  QS (no seasonality in final):0.2021  
#> Box-Ljung (no autocorr.): 32.32   Shapiro (normality): 0.9438 **
#> Messages generated by X-13:
#> Warnings:
#> - At least one visually significant seasonal peak has been found
#>   in one or more of the estimated spectra.

m_ext <- seas(ts_with_extreme_values)
summary(m_ext)
#> 
#> Call:
#> seas(x = ts_with_extreme_values)
#> 
#> Coefficients:
#>                   Estimate Std. Error z value Pr(>|z|)    
#> MA-Nonseasonal-01  0.90142    0.05522   16.32  < 2e-16 ***
#> MA-Seasonal-12     0.99932    0.16195    6.17 6.81e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> SEATS adj.  ARIMA: (0 1 1)(0 1 1)  Obs.: 60  Transform: log
#> AICc: 175.8, BIC: 180.8  QS (no seasonality in final):    0  
#> Box-Ljung (no autocorr.): 31.12   Shapiro (normality): 0.9896  
#> Messages generated by X-13:
#> Notes:
#> - The program cannot perform hypothesis tests for kurtosis on
#>   less than 50 observations.

^{Created on 2024-01-17 with reprex v2.0.2}

andreranza commented 10 months ago

I wonder if this would be more of a Case Study than an exercise.

Possible questions to be asked on the reprex below:

to what extent one should tweak ev_const so the program recognize obs 49 as an outlier?
how would you switch off outlier detection?
would it make sense to make the program recognise the EV as an outlier?
how would you override the default options so that EV is detected as an outlier?
use ev_const = 1.15. observe what happens to the chosen transform. Can you explain why there is a difference?

@christophsax and @jlivsey, how do you feel about those? I would not be able to answer number 5

andreranza commented 10 months ago

There's a more in-depth discussion on the difference between EV and outliers in the X11 chp tough. Maybe this would fit better there.

christophsax commented 10 months ago

For @jlivsey to decide.

christophsax / seasonalbook.content

Add outlier exercises #16