WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
600 stars 72 forks source link

Example of custom model doesn't work #48

Closed tarstars closed 2 years ago

tarstars commented 2 years ago

Hello Will,

It's me again.

Now I was trying to run example with customized model:

  import tensorflow_probability as tfp

  pre_y = data[:70, 0]
  pre_X = data[:70, 1:]
  obs_series = data.iloc[:, 0]
  local_linear = tfp.sts.LocalLinearTrend(observed_time_series=obs_series)
  seasonal = tfp.sts.Seasonal(nseasons=7, observed_time_series=obs_series)
  model = tfp.sts.Sum([local_linear, seasonal], observed_time_series=obs_series)

  ci = CausalImpact(data, pre_period, post_period, model=model)
  print(ci.summary())

There are a couple of comments for this code:

Keeping all this in mind, I wrote my own lunapark:

data = generate_data()
obs_series = data.iloc[:, 0].astype("float32")
regular_data = tfp.sts.regularize_series(series=obs_series)
local_linear = tfp.sts.LocalLinearTrend(observed_time_series=regular_data)
seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=regular_data)
model = tfp.sts.Sum([local_linear, seasonal], observed_time_series=regular_data)

pre_period = ["2012-01-01", "2014-12-01"]
post_period = ["2015-01-01", "2016-11-01"]
ci = CausalImpact(regular_data, pre_period, post_period) # , model=model
print(ci.summary())

the whole runnable example you can find there

The problem is that this code runs with default model, but, for some unknown reason, produces nan values for the custom model. I'm looking forward to hear from your on this.

WillianFuks commented 2 years ago

Hi @tarstars ,

Thanks again for pointing that out. I just updated the docs as you can see here. Please let me know what you think.

As for your example, the NaN values are the result of not normalizing the data when building the model. So tfcausalimpact takes your input data that is not-normalized, normalizes it and then uses the model to compute the probability of observing the data, which results in zero.

This zero eventually is divided somewhere which results in the NaN results you observed.

To fix that is quite simple, you just need to normalize the data before using it:

from causalimpact.misc import standardize

pre_period = ["2012-01-01", "2014-12-01"]
post_period = ["2015-01-01", "2016-11-01"]
data = tfp.sts.regularize_series(generate_data())
normed_data = standardize(data)[0].astype('float32')
obs_series = normed_data.loc[:pre_period[1], 0]

local_linear = tfp.sts.LocalLinearTrend(observed_time_series=obs_series)
seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=obs_series)
model = tfp.sts.Sum([local_linear, seasonal], observed_time_series=obs_series)

ci = CausalImpact(data, pre_period, post_period, model=model)
print(ci.summary())

Let me know if this works for you. As soon as everything is confirmed I'll merge the new code into master.

Best,

Will

tarstars commented 2 years ago

Hi @tarstars ,

Thanks again for pointing that out. I just updated the docs as you can see here. Please let me know what you think.

As for your example, the NaN values are the result of not normalizing the data when building the model. So tfcausalimpact takes your input data that is not-normalized, normalizes it and then uses the model to compute the probability of observing the data, which results in zero.

This zero eventually is divided somewhere which results in the NaN results you observed.

To fix that is quite simple, you just need to normalize the data before using it:

from causalimpact.misc import standardize

pre_period = ["2012-01-01", "2014-12-01"]
post_period = ["2015-01-01", "2016-11-01"]
data = tfp.sts.regularize_series(generate_data())
normed_data = standardize(data)[0].astype('float32')
obs_series = normed_data.loc[:pre_period[1], 0]

local_linear = tfp.sts.LocalLinearTrend(observed_time_series=obs_series)
seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=obs_series)
model = tfp.sts.Sum([local_linear, seasonal], observed_time_series=obs_series)

ci = CausalImpact(data, pre_period, post_period, model=model)
print(ci.summary())

Let me know if this works for you. As soon as everything is confirmed I'll merge the new code into master.

Best,

Will

Will,

Thank you a lot for a prompt reaction. I just tested your patch and it works. Ship it!

Arseniy

WillianFuks commented 2 years ago

Thanks @tarstars for the reply. The new code has just been released.

I'll close this thread now, if you find anything please let me know.

Best!

Will