Redshift does not handle either covar or covar_pop, which means default__regress doesn't work. There may be other databases where this is true.
Current solution
The best solution is to use the chol method, which never relies on the default__regress macro. So the problem is fixed by simply using the suggested method for calculating regression coefficients.
I will make a note in the README about this.
If a user utilizes the fwl method, in the case of databases that do not support covar_pop, a compilation error should be raised, since it is not possible to calculate without two select statements. The compilation error should suggest using chol.
How to fix for real
This is actually fixable.
The problem is, frankly, I don't have the wherewithal these days to support this project in excruciating depth. (Sorry, I'm busy on other things.) Because this issue is not something that should impact Redshift users unless they deviate from the suggested method of using chol, I don't consider this critical enough to spend time on.
If anyone reading this wants to fix it themselves, there is a solution if you are willing to contribute a PR. For databases that do not support covar_pop, an additional preprocessing step needs to be done. It looks kinda like this:
with
step1 as (
select
grp,
avg(x1) over (partition by grp) as avg_x1,
avg(y) over (partition by grp) as avg_y,
x1,
y
from data
),
step2 as (
select
grp,
avg((x1 - avg_x1) * (y - avg_y)) / var_pop(x1) as x1_slope,
avg(y) - avg(x1) * avg((x1 - avg_x1) * (y - avg_y)) / var_pop(x1) as intercept
from step1
group by grp
)
select *
from step2
Basically, it is possible to calculate covar_pop, but only by using two select statements, the first one calculating the average of the column.
That said, the code is a mess (in fairness to past me, how could it not be? This is jinja2 + SQL abuse dialed up to 11). And this preprocessing step should only occur for databases where covar_pop is not a supported method. I will not accept a PR that adds the window funcs to dbs that actually support covar_pop as it is a superfluous calculation in those cases. Adding this will be a lot of work, and I don't recommend diving down this rabbit hole. Please consider just using chol, it will make everyone's life easier. 😄 But, if you are insane enough to add this and you meet all the requirements, I salute you and I will get your change in.
Redshift does not handle either
covar
orcovar_pop
, which meansdefault__regress
doesn't work. There may be other databases where this is true.Current solution
The best solution is to use the
chol
method, which never relies on thedefault__regress
macro. So the problem is fixed by simply using the suggested method for calculating regression coefficients.I will make a note in the README about this.
If a user utilizes the
fwl
method, in the case of databases that do not supportcovar_pop
, a compilation error should be raised, since it is not possible to calculate without two select statements. The compilation error should suggest usingchol
.How to fix for real
This is actually fixable.
The problem is, frankly, I don't have the wherewithal these days to support this project in excruciating depth. (Sorry, I'm busy on other things.) Because this issue is not something that should impact Redshift users unless they deviate from the suggested method of using
chol
, I don't consider this critical enough to spend time on.If anyone reading this wants to fix it themselves, there is a solution if you are willing to contribute a PR. For databases that do not support
covar_pop
, an additional preprocessing step needs to be done. It looks kinda like this:Basically, it is possible to calculate
covar_pop
, but only by using two select statements, the first one calculating the average of the column.That said, the code is a mess (in fairness to past me, how could it not be? This is jinja2 + SQL abuse dialed up to 11). And this preprocessing step should only occur for databases where
covar_pop
is not a supported method. I will not accept a PR that adds the window funcs to dbs that actually supportcovar_pop
as it is a superfluous calculation in those cases. Adding this will be a lot of work, and I don't recommend diving down this rabbit hole. Please consider just usingchol
, it will make everyone's life easier. 😄 But, if you are insane enough to add this and you meet all the requirements, I salute you and I will get your change in.