WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

unstable predictions #21

Open shnaqawi opened 3 years ago

shnaqawi commented 3 years ago

This is a great work. Thank you for putting this together.

I have noticed that when I include several covariates in the model, the results become very unstable. On the same data, the result could be anything from significant to non-significant (p-value greater or less than alpha) and positive to negative relative impact values. Any idea what could be the issue. I tried different learning rates for the optimizer but had no luck. Running the algorithm in R with the same data doesn't show any instability issues.

WillianFuks commented 3 years ago

Hi @shnaqawi ,

Not sure what's going on. If you could share some data (it could undergo transformations) so that I can replicate the issue here that'd be helpful.

Also, did you try with both algorithms, 'HMC' and 'VI'? On both you got the same?

sp-alicia-horsch commented 3 years ago

Hi, I wanted to open a new issue but maybe this is related. First thanks @WillianFuks for this awesome package. I think that you write in your readme that 'variational inference' is the default of this package. However, when I check model.py line 319 it says method: str = 'hmc' which defines the default, no? I could be wrong and it might be determined somewhere else but maybe this could explain unstable predictions?

WillianFuks commented 3 years ago

Hi @sp-alicia-horsch ,

The default model does use vi method by default. This happens in the input processing when arranging the arguments for the model.

Are you also observing unstable predictions? If so, if you could please share more information that'd be helpful, not sure yet what might be going on.

shnaqawi commented 3 years ago

Hi @WillianFuks,

Thank you for your response. with the 'HMC' algorithm, the p-values usually stays stable but not the impact itself (relative impact reported negative or positive). It's all very stable in R.

I just emailed you some synthetic data that I used in my case.

WillianFuks commented 3 years ago

Hi @shnaqawi ,

I ran some simulations of your data with 'hcm' and also observed a higher variation for the relative effects. Unfortunately it does seem to be related to how the algorithm works altogether ( for the 'vi' algorithm this is quite expected as well).

The other metrics seem to be somewhat stable but as the relative factor is non-linear then its deviation ends up being more pronounced indeed.

I tried running the same code in R but unfortunately couldn't make it work here (not enough memory in my working machine ATM). If you could share please, what results do you get from the R package?

As a final note, I'll leave this issue open in the hopes someone in the community has some idea on how to circumvent this issue.

rangelrey commented 2 years ago

Hi Willian! Great work here! Happy to see such a well package ready! I am also experiencing the issues commented above. I am running the model different times and each time I get very different results. Sometimes a p-value<0.05 sometimes >0.2 Thanks!

WillianFuks commented 2 years ago

Hi @rangelrey ,

Just to confirm, are you using vi alg? Do you get the same when using hcm?

rangelrey commented 2 years ago

Hi @WillianFuks , yes I am using vi Below a list of the p-values I am getting with vi 0.08, 0.06 ,0.17, 0.01, 0.09, 0.09, 0.12, 0.00, 0.00, 0.05, 0.09, 0.02 The p-value jumps from significant to non-significant quite often The relative effect varies from 7% to 11%.

With hcm: 0.04, 0.01, 0.03, 0.04 , 0.05, 0.03, 0.07, 0.02, 0.05, 0.03, 0.02, 0.06, 0.03 Hcm seems more stable, but still some discrepances

The relative effect is almost the same in all trials, in my case around 5.5% always.

I am simply rerunning my jupyter cell

WillianFuks commented 2 years ago

Hi @rangelrey ,

The results for the vi alg are expected to vary that much indeed (mainly because the algorithm works by surrogating the posterior distributions of each component with independent Gaussians, which in time series modeling is not always adequate).

As for the hmc, looks like this small variability is also expected so there isn't much to be done here. Notice that your data falls into the 5% threshold area which explains why they change so abruptly. If the result is 4.99% then it's already considered significant but if it's 5% then it's not.

This seems a bit arbitrary which is one of the main criticisms related to frequentist testing; unfortunately there isn't much we can do here.

You could either increase the value of alpha (something like running ci = CausalImpact(data, pre_period, post_period, alpha=0.07)) which increases false positives rate or lower the value (which increases false negatives rate). Probably your data context will better guide you on which decision to make.

attibalazs commented 1 year ago

Thanks for the awesome library, I'm having the same issue with VI, when I switched to HMC its more stable but slower.