Open brookslogan opened 1 year ago
Some thoughts on naming things here and how they should function.
Naming:
max_n_recent
, min_n_recent
max_n_times_per_epikey
, min_n_times_per_epikey
Functionality:
This also ties into discussions about alternative interpretations of n
in epix_slide
, or making epipredict
naturally act on epi_archive
s. Right now, epiprocess main has a different interpretation with epix_slide
's n
parameter, as does the dev branch's replacement, the before
parameter.
The min training window would also be useful as a step, or as part of step_training_window
.
A potential bigger tradeoff we're taking with a max_n_recent
, min_n_recent
approach is potential incompatibility with how things are treated at test time. At train time, we're happy to go search for training data farther in the past. But at test time (since we're fitting one model across all locations/etc.), we are being more demanding: we don't output forecasts for locations that have the expected predictor data missing. Not sure there's a better alternative though.
And another trade-off: it'll be harder to get epix_slide
to feed an appropriate time window of data in using its current approach. But we were considering changing how it does time windowing anyway. [On current epiprocess dev branch: we could just use before = 365000L
to get everything, but that's not efficient. So then we have to think about what the max number of additional times might be grabbed by this n_recent
approach.]
min_n_recent
and max_n_recent
min_n_recent
being mostly informational / for debugging sounds good
Grepping around, @dshemetov and I were unable to see
min_train_window
orquantile_by_key
actually used byarx_forecaster
,arx_classifier
,flatline_forecaster
, or any other forecaster I missed. We did find:min_train_window
part; see comment here.quantile_by_key
docs referencelayer_residual_quantiles()
'by_key
parameter, but I can't see it actually being forwarded along.