Changepoint detection with a tidy interface
discrepancy between regions returned by changepoints and regions returned by tidychangepoints #108

Open beanumber opened 1 week ago

beanumber commented 1 week ago

Note that the first region is correct, but the others are off by a rounding error.

  y <- segment(DataCPSim, method = "pelt", penalty = "BIC")
#> $mean
#> [1]  35.28356  58.19948  96.76671 156.51950
#> $variance
#> [1]  126.8758  370.5227  920.9762 2405.9745
#> # A tibble: 4 × 3
#>   region        param_mu param_sigma_hatsq
#>   <chr>            <dbl>             <dbl>
#> 1 [0,547)           35.3              127.
#> 2 [547,822)         58.1              372.
#> 3 [822,972)         96.7              924.
#> 4 [972,1.1e+03]    156.              2442.

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union
  DataCPSim |>
    as_tibble() |>
      region = rep(1:4, times = diff(c(0, changepoints(y), nobs(y$segmenter)))),
      id = row_number()
    ) |>
    group_by(region) |>
      N = n(), first = min(id), last = max(id), mean = mean(value), var = var(value)
#> # A tibble: 4 × 6
#>   region     N first  last  mean   var
#>    <int> <int> <int> <int> <dbl> <dbl>
#> 1      1   547     1   547  35.3  127.
#> 2      2   275   548   822  58.2  372.
#> 3      3   150   823   972  96.8  927.
#> 4      4   124   973  1096 157.  2426.

changepoint appears to be including the changepoints as the closed right end of the intervals, whereas we are using it as the closed left end.

beanumber commented 1 week ago

This may reduce to #60