WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

Change number of iteration of MCM when fitting the model #30

Open alessandropicca opened 2 years ago

alessandropicca commented 2 years ago

Hi Will,

first of all, thank you and congrats for the gret work of translating this package in Python.

In my company I am currently working on implementing a Causal Inference use case, by using your library, and I am struggling with the possibility of changing the number of iteration of the Markov Chain estimation procedure.

Particularly in the docstring I see that in model_args there is not the possibility to set the niter argument, anyway when I run _ci.modelargs, I get as output this dictionary {'standardize': True, 'prior_level_sd': 0.01, 'nseasons': 7, 'season_duration': 1, 'fit_method': 'hmc', 'niter': 1000} which is making me thinking that changing the number of iteration it is actually possible. In the light of this idea I have tried to run the following code _ci = CausalImpact(data, pre_period, post_period, model_args= {"standardize" : True, "prior_level_sd" : 0.01, "nseasons" : 7, "season_duration" : 1, "fitmethod" : "hmc", "niter" : 5000}) which lead to an output of _ci.modelargs equal to {'standardize': True, 'prior_level_sd': 0.01, 'nseasons': 7, 'season_duration': 1, 'fit_method': 'hmc', 'niter': 5000} So I was totally sure that I founded the way to change the number of iteration of the Markov Chain procedure, but then I realized that all the posterior drawn from this model has a length equal to 100, while they should be equal to 5000. In other word if I run _ci.modelsamples[i].shape I got 100 as first number and not 5000, for any i in _len(ci.modelsamples)

Can you please explain to me what does it mean and, and particularly what is the relationship between the length of the posterior sample drawn and the number of iterations of the Markov chain procedure and how to generally change it.

Thank you in advance for your support

WillianFuks commented 2 years ago

Hi @alessandropicca ,

The value niter that you referred is used internally in the step of sampling time series from the posterior distribution, as you can see here. It's used in essence to find the p-value; currently it's not possible to change it as it doesn't really affect much final results.

Now for the number '100' that you mentioned, it's actually set when sampling the posterior of the model parameters. You can either change the values for vi or hmc algorithms, which is equivalent of setting their sampling rate when running the Markov Chain algorithm.

It'd also be possible to customize the main API to receive those values as input. Unfortunately I do not have available time for doing so now but if the community sends a PR I'd happily merge it.

Hope that helps, let me know if you still have any questions.

Best,

Will