aws / random-cut-forest-by-aws

An implementation of the Random Cut Forest data structure for sketching streaming data, with support for anomaly detection, density estimation, imputation, and more.
https://github.com/aws/random-cut-forest-by-aws
Apache License 2.0
206 stars 33 forks source link

Revisit calibration in RCFCaster to improve forecasts near boundaries (and handle physical infeasibility, such as -ve values, etc.) #399

Closed sudiptoguha closed 6 months ago

sudiptoguha commented 10 months ago

RCFCast seeks to provide conformal forecasts, however assumes that the data can be transformed in a translation invariant manner. This implies that the forecast can become -ve even when all input values are positive; which can be an anathema if negative values are physically impossible (such as, number of tickets, errors, etc). An easy solution is course to tack on a corrective step, say max(value,0) for the forecast (and confidence intervals) -- and this may be a fine solution in some cases. However this introduces the question "knowing that some values are physically impossible, can we adjust the forecasting to produce forecasts that rustics boundary values"? There is a great amount of discourse in boundary value and initial value dichotomy in the literature of forecasting, and it would be useful if RCFCast could absorb such boundary constraints. Of course this would require a change in calibration -- because it is obvious that calibration and forceful adaptation to constraints are non-commutative operations.

In addition, the current error bounds are exposed as a function of the lookahead steps in the forecasting horizon. It would be useful to adapt the error bounds to be a joint function of (lookahead, predicted value). In fact that second function could naturally absorb any physical/boundary constraint, and lead to further refinements of calibration.

sudiptoguha commented 8 months ago

PR #401