davidtedfordholt / fable.bsts

Implementation of the bsts package for use with the fable framework
8 stars 3 forks source link

Add NormalPrior #13

Open davidtedfordholt opened 4 years ago

davidtedfordholt commented 4 years ago

We will need to figure out how to specify a prior intuitively. This includes the question of how much would need to be specified BEFORE the fable workflow, and whether or not things need to be in a tibble with the same key structure, a tsibble with the same key structure and index values, maybe just partial index values, or as columns in the tsibble, itself... I don't know!

This will be alongside SdPrior and all the other specific prior specification methods.

mitchelloharawild commented 4 years ago

Can you point me to an example of how priors are specified in bsts, perhaps with an example?

It may be appropriate to specify priors using distributions via distributional, which will be used by fabletools in the next release.

davidtedfordholt commented 4 years ago

Shouldn't you sleep sometimes?

library(bsts)

state <- list()

state <- bsts::AddLocalLevel(
  state.specification = state,
  y = rnorm(10, mean = 5, sd = 2.5),
  sigma.prior = Boom::SdPrior(
    sigma.guess = 3     # prior guess of value of standard deviation
  ),
  initial.state.prior = Boom::NormalPrior(
    mu = 4,             # mean of the prior distribution
    sigma = 3,          # standard deviation of prior distribution
    initial.value = 4   # initial value of parameter in MCMC
  )
)
davidtedfordholt commented 4 years ago

Different types of priors, of course, come from different distributions, and they have functions (in the Boom and BoomSpikeSlab packages) that define the priors within the functions adding model components. (those packages are, of course, dependencies of bsts)

mitchelloharawild commented 4 years ago

Looking into Boom and BoomSpikeSlab a bit, there certainly are a lot of priors available for bsts. From what I can tell, NormalPrior (Normal dist), SdPrior (inverse Gamma dist), and SpikeSlabPrior are used by bsts. This smaller set of priors could be ported to this package, but it is likely easier to simply depend on Boom and use the same functions. Some changes can be made for interface style consistency (snake case, etc.), but keeping it similar to bsts will make the interface familiar for bsts users and allow existing bsts resources to be usable for fable.bsts users.

I was thinking that it would be nice to use distributional::dist_normal() for Boom::NormalPrior(), however for some reason NormalPrior() also specifies MCMC parameters.

I'm not sure where the tibble/tsibble and key structures come into specifying the priors, could you go through your thoughts for this a bit more? Typically a tsibble (or its index) is provided when the special requires temporal information - I wouldn't have thought bsts priors vary over time?

davidtedfordholt commented 4 years ago

Sorry, you're right. I went for the simplest example of specifying a prior, rather than one that includes information of this sort. In the example below (from Stitchfix, you can see that specific external regressors can be given different priors for inclusion probability and coefficient estimates.

library(bsts)
data(iclaims)

prior.spikes <- c(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,  1  ,0.1)
prior.mean <- c(0,0,0,0,0,0,0,0,0,  0.6  ,0)

### Set up the priors
prior <- SpikeSlabPrior(x=model.matrix(iclaimsNSA ~ ., data=initial.claims), 
                        y=initial.claims$iclaimsNSA, 
                        prior.information.weight = 200,
                        prior.inclusion.probabilities = prior.spikes,
                        optional.coefficient.estimate = prior.mean)

That could be added through a specific use of xreg() (rather than just passing the regressor's name), and then having a default specified for any regressor that isn't given an explicit prior. It could be specified through a tibble with the same keys as the tsibble, with the corresponding column in a priors tibble specified for where they integrate.

When specifying holidays, those can be modeled in different ways, and can have individual priors. That points in the direction of a tsibble, especially if we are specifying priors that include an expectation of a spike in one year's instance of a holiday that we don't expect in the future (last year's Golden Week in Japan being longer than usual, because of royal ascension, etc). If the data, itself, was in a tsibble that included information about whether or not the day was a holiday (think the example in fasster), that could also be a way of passing holiday information along. It would get pretty complex with respect to priors, though, unless those were in a tibble based on the holiday name.

davidtedfordholt commented 4 years ago

As an added bonus, we can look at some of how TensorFlow implemented structural time series, but it doesn't lend itself to an intuitive reading of the model structure in the code.

mitchelloharawild commented 4 years ago

Even for the exogenous regressors, it is one prior per regressor.

Is your main concern with the key structure to have different priors for different keys? If so, model_when() may be helpful here: https://github.com/tidyverts/fable/issues/54 Later, I plan to add support for a recipes style interface for building model formulae. This would be a better interface for data.frame specification of priors across multiple keys.

For holidays, I still am unsure how best to represent them and their parameters. The data approach used by prophet is pretty good, and what I'm defaulting to for now. I think holidays need an identifying key to match holidays across days/years.

For fable.bsts, keeping a similar structure to bsts will keep things simple. Some simplification may be possible by cleverly filling in some arguments of the state.specification (such as the data). Perhaps ask the bsts maintainer about interface changes that they would make if they had the opportunity.

davidtedfordholt commented 4 years ago

For exogenous regressors, I suppose you could require that, if priors are to be specified, each regressor is specified either individually with a prior or as a group (with potentially one member) with a prior that is the same for all members of the group. It takes away the ability to specify an exogenous regressor by just adding the variable name, which the designer in me mourns, but is probably the right answer.

Actually, that wasn't even a thing I was thinking about, but I looked at model_when() and think it is a good solution to that problem. I need to update the data I use in my scratch example to have more than one data stream.

mitchelloharawild commented 4 years ago

If you want to specify exogenous regressors without xreg() or some other special, I don't see how else you would add the priors. Keeping a default (diffuse?) prior as the default would allow users to add regressors to the model by name. All unnamed exogenous regressors will be combined into one call to xreg, so it would make sense to have the xreg() special accept a vector of prior parameters.

xreg(y + z, prior = spike_slab_prior(prior_spikes = c(0.1, 0.1), prior_mean = c(0, 0)))

I'm again unsure if it would be worthwhile to write wrappers for the priors. Doing so will make for a more style consistent interface, but it would be a decent amount of work and would make it harder to translate between bsts and fable.bsts.

davidtedfordholt commented 4 years ago

When I talk about being able to specify exogenous regressors without a call to xreg(), I am assuming that a default value for those would be passed for priors. It's partly me thinking through how problematic it would be to define such defaults. Essentially, you would take all of your calls to xreg() and combine them, adding defaults for those regressors that were defined without an explicit call to xreg() and the priors defined in the xreg() call for the others.

xreg() (sorry, just felt the need to type it one more time)

As to writing wrappers, I'm not sure that the main idea is to allow for direct translation, as the people using bsts in enough depth to be setting priors now are folks who likely understand the ideas behind it well enough that they can figure out a somewhat different method. That said, I think it might often just be a direct call to one of the types of priors, with the ability to set that you want to pass on to a specific different prior to the one most likely to be needed. So it might just be a matter of defining the most common prior function for a given prior, but wrapping that and the bility to call something else, when acceptable. Does that make sense?

mitchelloharawild commented 4 years ago

Everything you've said here sounds reasonable. If having defaults for exogenous regressors is too troubling, you could raise a warning if they haven't been specified.

As for wrapping bsts, I also agree that a direct translation isn't always best. For example, in fable.prophet I specify the seasonal period relative to the data's interval, whereas prophet defines it relative to daily data. I do think that it should be easy enough to produce the same/equivalent model via the wrapper - whether that is a simple model or a highly specified model that an advanced user may want. Having the ability to call different priors should work for this.

davidtedfordholt commented 4 years ago

Essentially, I'm envisioning a prior() function that takes arguments based on either a specified type, or for the default type based on the special in which it is defined. Pass the arguments from prior() into the special it's called in, then parse it out based on the specifications of the special, unless it comes with its own. I'll have to work through how to best document that, such that people can understand the pattern, but I think that would show up in the documentation for a given special.