azmyrajab / polars_ols

Polars least squares extension - enables fast linear model polar expressions
MIT License
111 stars 9 forks source link

[BUG] Inconsistent results when manually assign intercept column #33

Closed zexumath closed 1 month ago

zexumath commented 2 months ago

I'm using pls.compute_least_squares_from_formula with solve_method='svd', but I'm getting different result when using the two formulas:

formula_1 = 'y ~ <features>'
formula_2 = 'y ~ intercept + <features> -1'

where intercept is just column pl.lit(1.0). The second one is the same as the results from statsmodels, whereas the first one can produce big coefficients when the inputs is ill-conditioned. For spotting this issue, I'm using ~5000 records and ~40 features.

azmyrajab commented 2 months ago

Hi @zexumath, thanks for raising the issue. Will try to reproduce and fix this

azmyrajab commented 2 months ago

I can't see anything obvious that would cause this behaviour, and my tests don't indicate issues for normal (non ill conditioned) data.

The codepath for formula api eventually calls the same compute_least_squares entry point with add_intercept=True which adds an additional feature in the same manner (pl.lit(1.0))

A few questions, just to rule things out:

If neither of above apply, I'd appreciate if you could post a reproduce which I could work on.

    if add_intercept:
        if any(f.meta.output_name == "const" for f in features):
            logger.warning("feature named 'const' already detected, assuming it is the intercept")
        else:
            features += (pl.lit(1.0).alias("const"),)