Closed zexumath closed 1 month ago
Hi @zexumath, thanks for raising the issue. Will try to reproduce and fix this
I can't see anything obvious that would cause this behaviour, and my tests don't indicate issues for normal (non ill conditioned) data.
The codepath for formula api eventually calls the same compute_least_squares
entry point with add_intercept=True which adds an additional feature in the same manner (pl.lit(1.0)
)
A few questions, just to rule things out:
If neither of above apply, I'd appreciate if you could post a reproduce which I could work on.
if add_intercept:
if any(f.meta.output_name == "const" for f in features):
logger.warning("feature named 'const' already detected, assuming it is the intercept")
else:
features += (pl.lit(1.0).alias("const"),)
I'm using pls.compute_least_squares_from_formula with solve_method='svd', but I'm getting different result when using the two formulas:
where intercept is just column pl.lit(1.0). The second one is the same as the results from statsmodels, whereas the first one can produce big coefficients when the inputs is ill-conditioned. For spotting this issue, I'm using ~5000 records and ~40 features.