Why Robyn Should Allow Negative Intercepts

extrospective commented 2 years ago

We've been actively using Robyn now for several months, and feel quite strongly that negative intercepts are critical. For our own use, we have experimented with a branch of Robyn which allows negative intercepts and one that does not, studying the implications.

Why the need for negative intercepts might not be obvious from sales variables, it becomes more obvious when context variables are required.

For example, we want a context variable which tells us information about the type of coupon being offered to customers. At first we thought that the "average coupon" would be sufficient, but then on further exploration we decided also to evaluate a variable about the "coupon in relation to recent average discount". Of course, the possible variations are endless. As modelers we could try different variations of relative to moving average, and might arbitrarily scale and shift our data variable.

As an aside: This variable will not intrinsically have the same scale as other media variables (which could be an issue for ridge regression since Robyn does not standardize variables, but let's parking lot that for now.) We find that other media variables might be in the tens of thousands to millions, but our context variable does not have to be as large.

As we tried different ways to define this context variable, we found that disallowance of a negative intercept caused distortions in relationships (coefficients) under certain circumstances. For example we tried:

A raw difference (discount minus mean discount prior 30 days)
Scaled difference (scaled above by 100 times)
Shifted difference (100 plus the raw difference)

If we define this context variable in different ways, we may find ourselves requiring a negative intercept for model (coefficient) consistency. Without a negative intercept, we run into the same problem we would with any multilinear regression without allowance for an intercept: situations where the fit equation would desire a negative intercept, but in its absence is forced towards solutions which go through the origin (and thereby biases the coefficient).

As an aside again: I want to make one observation here which may also surprise the reader, as it did us. Once we changed the Robyn code to "allow" for negative intercepts, it changed possible solutions, even for positive intercepts. It was as if distorting / constraining the search area along the way had changed the ultimate solutions!

Rather than attach our example data here, I am going to step back to simple regressions and make our case.

With Excel (attached) I have defined an arbitrary X-Y relationship, and then simply shifted the X variable to show the problem of not having negative intercepts: excel_shifted_variable_simple_example.xlsx

In the first chart, imagine that X is the context variable and Y is the target variable. We find ourselves with a positive intercept, and no problem.

But in the second chart, we have simply defined X somewhat differently (we have added 100 to each value in the original chart). The slope is identical, but the intercept is now negative. Note: the similar equation, adjusted only by the coefficient. If we did not allow the negative intercept in this case, we would have changed the slope line, and hence the relationship between X and Y. Yet the relationship between variation in X and Y has not changed, so we want the same slope (coefficient).

These Excel examples may seem abstract, but it avoids posting proprietary data and it boils the problem down to the simplest case. I hope this is sufficient to understand why this arises for us. If there is more support or examples we can provide, I am happy to do so.

Our problem: We have found 0 intercepts occurring in many cases in our models using the raw Robyn code, and each time we appreciate that the coefficients have been biased to make the 0 intercept work. And we believe the negative intercepts allow this to be fixed, and directly introduce no problems. (You may find a problem around how to "interpret" these intercepts, but this opens up an issue I will address in a separate issue.)

Our proposed solutions, either:

Remove the coefficient constraint, or
Pass an argument through the system allowing the user to alter the default constraint

Since Robyn is relatively young, perhaps the default could be unconstrained to set this on the right path.

gufengzhou commented 2 years ago

Hello, first of all happy new year! Actually we're still on vacation but you're making a very interesting case so let me try to address it quickly :)

1st of all, Robyn does standardize the X matrix because glmnet does so by default, quote documentation of the argument standardize "is a logical flag for x variable standardization prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize = TRUE."

2nd, by using multi-objective capability from Nevergrad, Robyn is always optimizing towards at least two objective functions (NRMSE & DECOMP.RSSD). In other words, it never ONLY looks for the "best fitting line" as you illustrate in the screenshot, but ALSO tries to minimize the decomposition distance. Therefore, the beta coefficients will never be the "best solutions" as any regression packages produce.

3rd one is kind of philosophical. Assuming you stop all activities, the equation will reduce to y_hat = intercept, while y_hat represents the "true baseline sales", whatever that might be. Intuitively, it's hard to justify a negative baseline. If we'd agree on this, then in your second plot, the straight line intercepts below 0 on y can't possibly be the true relationship between x & y. The true relationship might very probably be nonlinear, even though we'll never know.

Having said that above, I do understand the case you're making and it won't be too difficult to allow negative intercept. We could create a new argument intercept_sign that's default to positive, while users like you who know what they're doing can set it to unconstrained. I'll put it on the new feature list. Hope the explanation makes sense.

extrospective commented 2 years ago

Re: glmnet. Very useful feedback. Thanks, will review.

Re: multi-objective. I think Robyn should work for a single paid media variable as well as multiple paid media variables. I can imagine a company having one paid media variable initially and then adding a second marketing channel and expanding. Thinking about the "limit of Robyn as paid marketing variables decrease to 1" then we have also done testing which focuses on one paid media variable and specifically on NRMSE (decomp.rssd=0 always with one paid media variable). In short, I do not think the multi-objective capability is relevant in every case. (If you feel Robyn cannot be used with one paid media variable, that would be interesting; I did not see that in any page.) This paradigm is used for some of our tests of Robyn; thinking that Robyn should work both for one paid media variable and multiple paid media variables.

Re: baseline sales. If you have a context variable not centered at zero, as we have in this constructed variable, then the "true baseline sales" is not negative simply because the intercept is negative. The intercept does not represent the true baseline sales, but instead the true baseline sales is the sum-product of all non-paid media variables with their coefficients. I plan to supply a more detailed writeup on why we feel this way, as I realize it is not immediately apparent. The argument in short: paid media variables generally start from 0 and increase monotonically, but that is not a requirement for context variables, nor does it apply to Prophet output. So the "baseline sales" would actually be all non-paid media variables set at reasonable baseline values multiplied by their coefficients and then adding the intercept. You can have a positive baseline sales with a very negative intercept.

Failure to allow negative intercepts creates incorrect solutions when the context variables are not centered around 0 nor anchored at zero. If the context variable has a large mean for example, then the intercept needs to go negative to offset the context variable.

I do not know whether setting additional constraints on the context variables (such as centering at zero) would fully resolve this issue. But zero-centering is not a requirement for context variables in Robyn documentation today.

extrospective commented 2 years ago

Received note from @slavakx but cannot see the link here. strange. Anyway he asked whether I had factor variables which could be sent off to Prophet, and the answer is that we have both factor and non-factor context variables, so this remains an issue for us.

slavakx commented 2 years ago

@extrospective Thanks for your reply. I removed my thread once I realized that I missed your point about negative intercept in general.

gufengzhou commented 2 years ago

Hi, sorry for the late response due to our resource constraints at the beginning of year. we've discussed your use case and find it completely legitimate. We'll add an extra argument in the modelling function to allow negative intercept this month. Thanks for the very thorough description!

joangcc commented 2 years ago

Thanks for your change. I'm working with @extrospective at the same team. This change will surely help us in our case.

laresbernardo commented 2 years ago

Hi @extrospective + @joangcc thanks for bringing this up! We've enabled a new intercept_sign param which is set by default to "non_negative" but can be changed to "unconstrained". Thanks for your feedback and we will internally continue the debate whether we should change the default or not. Really interesting points.

facebookexperimental / Robyn

Why Robyn Should Allow Negative Intercepts #256