WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.
Apache License 2.0
593 stars 70 forks source link

Categorical Variables #75

Closed mekazanc closed 8 months ago

mekazanc commented 8 months ago

Hi again, I have a little question :)

Assume that we have an ordinal categorical variable (e.g. app_version) whose correlation > 0.6 and our target is app. installs.

Can we give this categorical variable directly to the model ? or Should we transform it somehow first and then give it to the model ?

My observation is that all of the control variables (e.g. different countries installs) have the same unit with target but this variable is not. Also, model did not give it high coefficient (< 0.1) even though having high correlation after a few runs!

I could not find any example regarding categorical variables in the sample notebook so that I wanted to ask here.

Thanks in advance

WillianFuks commented 8 months ago

Hi @mekazanc ,

There's no treatment related to categorical variables in this package, all treatment happens in this function and they tend to be more related to the structure of input.

Probably the best workaround is to apply a one-hot-encoding transformation before processing the data.

You said that you found low weights for the correlation of those variables, I'm supposing that you used this variable as a float value. This will not work as those values have no real meaning and therefore any correlation extracted is due to noise, not signal.

mekazanc commented 8 months ago

I got it! Thanks for your explanation.