WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
593 stars 70 forks source link

I see different forecasted values everytime i run causal impact using same parameters #55

Open agupta62222222333333333 opened 2 years ago

agupta62222222333333333 commented 2 years ago

help me with a fix pls. I am using python 3.6

WillianFuks commented 2 years ago

Hi @agupta62222222333333333 ,

Without a sample code it's hard to tell what might be going on. If you can share with us what you are doing it might help us better guide you.

As a guess, maybe what you are observing is the results of using variational inference as main method for fitting. In this case, it's totally expected that you'll get somewhat varying results each time you run the code. As a test to confirm, you could try changing the method to 'hmc' like so:

ci = CausalImpact(data, pre_period, post_period, model_args={'fit_method': 'hmc'})

This method whilst slower is expected to be more stable so it might help you.

Let me know if it works for you.

Best,

Will

agupta62222222333333333 commented 2 years ago

Hey Willian, Thanks for the quick response. I am trying this argument, but the code run is taking a lot of time. I am also using the following arguments: 'nseasons' and 'harmonics' to account for seasonality. Any way in which I can optimize the process? I can also send you the exact code and a dummy data if that'll be helpful

Thanks and Regards, Aditya Gupta

On Wed, Jun 29, 2022, 21:38 willfuks @.***> wrote:

Hi @agupta62222222333333333 https://github.com/agupta62222222333333333 ,

Without a sample code it's hard to tell what might be going on. If you can share with us what you are doing it might help us better guide you.

As a guess, maybe what you are observing is the results of using variational inference as main method for fitting. In this case, it's totally expected that you'll get somewhat varying results each time you run the code. As a test to confirm, you could try changing the method to 'hmc' like so:

ci = CausalImpact(data, pre_period, post_period, model_args={'fit_method': 'hmc'})

This method whilst slower is expected to be more stable so it might help you.

Let me know if it works for you.

Best,

Will

— Reply to this email directly, view it on GitHub https://github.com/WillianFuks/tfcausalimpact/issues/55#issuecomment-1170179925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3ACVEVDJZ6R6M2SEWUN5DVRRYGBANCNFSM52E5YAAA . You are receiving this because you were mentioned.Message ID: @.***>

agupta62222222333333333 commented 2 years ago

Just to add, my pre period is of 4 months and post period is 1 month and I am trying to make a daily prediction from the causal impact

Thanks and Regards, Aditya Gupta

On Wed, Jun 29, 2022, 22:19 Aditya Gupta @.***> wrote:

Hey Willian, Thanks for the quick response. I am trying this argument, but the code run is taking a lot of time. I am also using the following arguments: 'nseasons' and 'harmonics' to account for seasonality. Any way in which I can optimize the process? I can also send you the exact code and a dummy data if that'll be helpful

Thanks and Regards, Aditya Gupta

On Wed, Jun 29, 2022, 21:38 willfuks @.***> wrote:

Hi @agupta62222222333333333 https://github.com/agupta62222222333333333 ,

Without a sample code it's hard to tell what might be going on. If you can share with us what you are doing it might help us better guide you.

As a guess, maybe what you are observing is the results of using variational inference as main method for fitting. In this case, it's totally expected that you'll get somewhat varying results each time you run the code. As a test to confirm, you could try changing the method to 'hmc' like so:

ci = CausalImpact(data, pre_period, post_period, model_args={'fit_method': 'hmc'})

This method whilst slower is expected to be more stable so it might help you.

Let me know if it works for you.

Best,

Will

— Reply to this email directly, view it on GitHub https://github.com/WillianFuks/tfcausalimpact/issues/55#issuecomment-1170179925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZ3ACVEVDJZ6R6M2SEWUN5DVRRYGBANCNFSM52E5YAAA . You are receiving this because you were mentioned.Message ID: @.***>

WillianFuks commented 2 years ago

Hi Gupta,

Unfortunately when setting hmc specially with the nseasons it does get quite slow. I think the only thing that might help for now is trying to use a GPU to see if things speed up a bit (maybe experimenting on a google colab).

Other than that there isn't much that can be done. I think TFP team are working on developing new markovian sampling algorithms that will solve those problems much faster but it hasn't been implemented yet.

gallant-kinkajou commented 8 months ago

It looks like TFP now supports Gibbs sampling to some extent in tfp.experimental: https://www.tensorflow.org/probability/api_docs/python/tfp/experimental/sts_gibbs

Any estimate of how difficult this would be to integrate?

WillianFuks commented 8 months ago

Hi @gallant-kinkajou ,

This seems to be cool. Not sure if we'd be able to use it yet as the package sts should also be updated to work with it. When they do so implementing on our side should be straightforward.

ardsnijders commented 2 months ago

Hi @WillianFuks,

Just to piggyback onto this issue; I have performed some experiments doing a series of runs whilst keeping parameters at default values. Particularly my p-values are quite variant across identical runs: when using the default fit method my p-values are:

0.15, 0.1, 0.04, 0.03, 0.04, 0.09, 0.08, 0.02, 0.06, 0.02

With the hmc fit method they are slightly less variant, albeit still substantially:

0.03, 0.02, 0.06, 0.07, 0.04, 0.04, 0.07, 0.04, 0.06, 0.12

I would expect these to be more stable given that everything is kept constant. I'm thinking that perhaps due to the default value of niters=1000, the model has yet to converge and consequently at this stage the model fit (and associated p-values and so forth) is a bit more unstable? Any ideas on what may cause this, or what other checks I could perform to diagnose issues?

Thanks for your help.

WillianFuks commented 2 months ago

Hi @ardsnijders ,

Is it possible to share a mock version of the dataset you are observing this pattern? Not sure why this is happening, maybe the model parameters are not converging to a well behaved normal distribution in this dataset so sampling covers a wider set of values.

Experimenting with tweaking the default values of the HMC model would be an interesting approach indeed.

ardsnijders commented 2 months ago

Hi @WillianFuks,

Thanks for your response. Unfortunately I can't share the data here, I might be able to replicate the issue by synthesizing another dataset though.

I did perform some further diagnosis and ran the library on pre-intervention slices of the data - I wanted to check whether the model was able to recover the parameters for synthetic treatments, and here I found that even for baseline (pre-intervention, untreated) data, the model consistently found significant positive 'effects' for various 'treatment dates' (though it was able to recover the parameters for synthetic treatments on top of those, quite accurately).

Upon closer inspection I realized that my dependent variable exhibits an upward trend with time which is not captured by my (one) covariate. I'm thinking that this may be the cause of the model finding positive effects despite an absence of treatment, potentially explaining its instability - I should perhaps first try to identify a more suitable set of covariates and then re-assess my model.

If you have any good resources on incorporating trends into the model parameters that would be very appreciated. Thanks in advance for the help.

WillianFuks commented 2 months ago

Hi @ardsnijders ,

See if by using a LocalLinearTrend model it helps in your case.

That would require a customized model, please refer to the oficial docs section 2.3 on how to do so.