FMenchetti / CausalArima

An R package to estimate the effect of interventions on univariate time series using ARIMA models
15 stars 3 forks source link

Control group #26

Open schneiderpy opened 11 months ago

schneiderpy commented 11 months ago

Dear Fiammetta I came accross your package and I would like to know if there is a way to include a control group in the C-ARIMA approach?

Thank you in advance Andreas

FMenchetti commented 11 months ago

Dear Andreas, thank you for your question! Yes, you can include a control group among the covariates. To do that, you can use the option xreg within the CausalArima function. The xreg option allows you to incorporate a vector, matrix or data.frame of regressors that can be helpful in explaining the outcome y in the absence of the intervention. So, it can be used to include both covariates and a control group, if available. Please feel free to reach out if you have any further inquiries

schneiderpy commented 11 months ago

Thank you Fiammetta for your quick reply. I will give it a try ...

schneiderpy commented 11 months ago

Dear Fiammetta a quick question: I suppose that I have to change from long to wide format to include the control group values (as a time series) to construct a xreg like this ... xreg = cbind(valuescontrolgroup + grouptreatment + grouptrend + othercovariate)

schneiderpy commented 10 months ago

Dear Fiammetta I am trying to interpret the results of the main function causalarima(), and in particular the significant p-values. How can I obtain these? For example, I have the following results

summary_model $impact_norm $impact_norm$average estimate sd p_value_left p_value_bidirectional p_value_right 1 -44.29584 7.963023 1.328225e-08 2.656451e-08 1

$impact_norm$sum estimate sd p_value_left p_value_bidirectional p_value_right 1 -1417.467 254.8167 1.328225e-08 2.656451e-08 1

$impact_norm$point_effect estimate sd p_value_left p_value_bidirectional p_value_right 1 -66.95355 21.28963 0.000830745 0.00166149 0.9991693

$impact_boot $impact_boot$average estimates inf sup sd observed 66.593750 NA NA NA forecasted 110.889589 97.5934997 125.9130978 7.44123098 absolute_effect -44.295839 -59.3193478 -30.9997497 7.44123098 relative_effect -0.399459 -0.5349406 -0.2795551 0.06710487

$impact_boot$effect_cum estimates inf sup sd observed 2131.000000 NA NA NA forecasted 3548.466840 3122.9919905 4029.2191295 238.11939133 absolute_effect -1417.466840 -1898.2191295 -991.9919905 238.11939133 relative_effect -0.399459 -0.5349406 -0.2795551 0.06710487

$impact_boot$p_values alpha p 0.05 0.00 The result should be somehow significant ...

FMenchetti commented 10 months ago

Dear Andreas, it seems that the results are significant. For example, if you take the results under the Normality assumption, the estimated cumulative effect is -1417, the estimated standard deviation is 254 and the bidirectional p-value is 0, meaning that you reject the null hypothesis that the cumulative effect is 0. If summary_model is the output of CausalARIMA(), you can also use summary(summary_model) and you can plot the causal effect with plot(summary_model, type = "impact") or the comparison between the observed and forecasted series with plot(summary_model, type = "forecast")

schneiderpy commented 10 months ago

Dear Fiammetta, thank you for your prompt reply. What does the $impact_boot$p_values indicate?

palmierieugenio commented 9 months ago

The _pvalues are defined (as in Brodersen) CausalImpact are defined as:

min(mean(y.samples.post.sum >= y.post.sum), mean(y.samples.post.sum<= y.post.sum))

which is very similar to the formula they have used:

p <- min(sum(c(y.samples.post.sum, y.post.sum) >= y.post.sum), sum(c(y.samples.post.sum, y.post.sum) <= y.post.sum)) / (length(y.samples.post.sum) + 1)

The difference is that they add one in the denominator, to avoid the possibility of having p-values of exactly 0.

It should be mentioned that in both of the formulas more than p-values, these values should probably be called Probability of Direction, but we have kept the term p-value to keep the names similar to CausalImpact, also they are basically the bayesian equivalent of the p-values and the term probabilty of direction is less known.

In practice this measure refers to the one-sided tail area probability of overall impact in the entire period post intervention, or in other words the total or average effect.

schneiderpy commented 9 months ago

Thank you @palmierieugenio for the clarification.

schneiderpy commented 8 months ago

Hello Eugenio Unfortunatly I get still confused using your p-value(s). For example, the output below indicates a significante bidirectional p-value for the cummulative sum (actually all three p-values are significant). However, the "overall" p-value is not significant. Which one should I use?

$impact_norm $impact_norm$average estimate sd p_value_left p_value_bidirectional p_value_right 1 -43.4874 6.467814 8.86062e-12 1.772116e-11 1

$impact_norm$sum estimate sd p_value_left p_value_bidirectional p_value_right 1 -521.8488 77.61376 8.86062e-12 1.772116e-11 1

$impact_norm$point_effect estimate sd p_value_left p_value_bidirectional p_value_right 1 -70.51046 22.40516 0.0008245977 0.001649195 0.9991754

$impact_boot $impact_boot$average estimates inf sup sd observed 111.8333333 NA NA NA forecasted 155.3207303 105.0784663 126.4281383 5.61229305 absolute_effect -43.4873970 -14.5948050 6.7548671 5.61229305 relative_effect -0.2799845 -0.0939656 0.0434898 0.03613357

$impact_boot$effect_cum estimates inf sup sd observed 1342.0000000 NA NA NA forecasted 1863.8487640 1260.9415952 1517.1376601 67.34751658 absolute_effect -521.8487640 -175.1376601 81.0584048 67.34751658 relative_effect -0.2799845 -0.0939656 0.0434898 0.03613357

$impact_boot$p_values alpha p 0.050 0.243