bfastpp: inconsistent behaviour between formula and full methods

bfast2 / bfast

Breaks For Additive Season and Trend

https://bfast2.github.io

GNU General Public License v2.0

41 stars 15 forks source link

bfastpp: inconsistent behaviour between formula and full methods #50

Closed GreatEmerald closed 4 years ago

GreatEmerald commented 4 years ago

A lot of code is the same between .bfastpp.full() and .bfastpp.formula(), but unfortunately not everything. That leads to inconsistent behaviour, e.g. season is calculated as factor(cycle(y)) in the former and as 1 + seq(as.numeric(cycle(y[1]))-1 in the latter, which leads to incorrect type (numeric instead of the expected factor).

Following DRY principles, the code should be unified.

As an aside but related: bfastpp gives a radically different object when a formula is given, vs when only a ts is given. It's extremely confusing, as the user normally assumes that giving a formula would simply filter out the undesired columns from the result.

GreatEmerald commented 4 years ago

Git blame points to @appelmar for this.

I checked and the same would be achieved by just keeping .bfastpp.full() (i.e. the original function bfastpp) and running model.matrix() on the result to get the matrix version of it, according to the formula. This would also fix the seasonal dummies. Is there any particular reason why it is not done that way at the moment?

appelmar commented 4 years ago

As far as I remember, the reason for separate functions was just to touch as little as possible from the old (non-optimized) implementation. I fully agree that a unified implementation with less redundant code should be preferred. I think avoiding the factor conversion was simply a minor performance optimization. HTH!

GreatEmerald commented 4 years ago

OK, good to know! I need the seasonal dummies to work properly, since another feature depends on it. So I'll make a PR, than ask you for a review.

GreatEmerald commented 4 years ago

I see, that was indeed a deliberate optimisation that saves 1 ms. But the problem is that the solution is not featureful enough, as it doesn't expand dummies and also it doesn't do regular parsing of the formula, so e.g. response~I(trend^2)+harmon doesn't work altogether (trend gets omitted). If this is removed, the rest of the bfastmonitor optimisations are still working fine, so we don't lose so much performance:

Unit: milliseconds
                                                      expr       min        lq     mean    median       uq       max neval
 bfast:::.bfastmonitor.default(NDVIa, start = c(2010, 13)) 14.142997 16.040745 17.71687 16.544131 16.99268 246.62345  1000
          .bfastmonitor.matrix(NDVIa, start = c(2010, 13))  9.388053 10.843289 11.66023 11.193384 11.50773  34.69414  1000
  bfast:::.bfastmonitor.matrix(NDVIa, start = c(2010, 13))  8.314812  9.590819 10.42102  9.878581 10.13878  35.34907  1000

Middle is the version I'm proposing. I think that's fine for now, a hit of 1 ms is unfortunate but worth it for the correct interpretation of the formula.

GreatEmerald commented 4 years ago

This is fixed in #52.