azmyrajab / polars_ols

Polars least squares extension - enables fast linear model polar expressions
MIT License
102 stars 9 forks source link

null_policy not working properly when used with RollingKwargs #15

Closed stout-yeoman closed 5 months ago

stout-yeoman commented 5 months ago

Bug Description

When using the null_policy set to "drop" in combination with RollingKwargs, the library raises a polars.exceptions.ComputeError with the message: "the plugin panicked".

To Reproduce

Snippet that reproduces behaviour:

df = pl.DataFrame({
    "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "y": [3.496714, 6.861736, 8.647689, 10.523030, 10.765847, 14.765863, None, 19.530526, 20.542560, 21.536857]
})
regression_coefficients = pls.compute_rolling_least_squares(
        "y",
        pl.col("x"),
        add_intercept=True,
        mode="coefficients",
        rolling_kwargs=RollingKwargs(window_size=3, null_policy="drop"),
    )
df.with_columns(regression_coefficients)

Expected behavior

I expected the rolling operation to successfully drop the null values and perform the regression on the remaining samples in the window.

Actual behavior

The operation fails, and the following error is raised: polars.exceptions.ComputeError: the plugin panicked

Environment

polars version: [0.20.18] polars-ols version: [0.2.9] Python version: [3.12] Operating System: [Ventura 13.6.6]

Additional context

This error occurs specifically when the null_policy is set to "drop". Haven't checked other behaviour.

Please let me know if you need any more information or if there are any workarounds available.

azmyrajab commented 5 months ago

Hi @stout-yeoman, thanks for using this package and raising your issue (and for the super clear reproduce / and problem description) !

I think that upgrading to the latest polars-ols: v0.3.0, should resolve your issue. The correct null policy handling PR was merged into that PyPI release - apologies I should probably maintain a changelog.

Would you mind running pip install polars-ols --upgrade and letting me know if this has helped?

This is what I get, for reference:

import polars as pl
import polars_ols as pls

df = pl.DataFrame({
    "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "y": [3.496714, 6.861736, 8.647689, 10.523030, 10.765847, 14.765863, None, 19.530526, 20.542560, 21.536857]
})
regression_coefficients = pls.compute_rolling_least_squares(
    "y",
    pl.col("x"),
    add_intercept=True,
    mode="coefficients",
    rolling_kwargs=pls.RollingKwargs(window_size=3, null_policy="drop"),
)
print(df.with_columns(regression_coefficients))
shape: (10, 3)
┌─────┬───────────┬──────────────────────┐
│ x   ┆ y         ┆ coefficients         │
│ --- ┆ ---       ┆ ---                  │
│ i64 ┆ f64       ┆ struct[2]            │
╞═════╪═══════════╪══════════════════════╡
│ 1   ┆ 3.496714  ┆ {null,null}          │
│ 2   ┆ 6.861736  ┆ {3.365022,0.131692}  │
│ 3   ┆ 8.647689  ┆ {2.575488,1.184405}  │
│ 4   ┆ 10.52303  ┆ {1.830647,3.185544}  │
│ 5   ┆ 10.765847 ┆ {1.059079,5.742539}  │
│ 6   ┆ 14.765863 ┆ {2.121417,1.411164}  │
│ 7   ┆ null      ┆ {2.121417,1.411164}  │
│ 8   ┆ 19.530526 ┆ {2.844527,-2.994593} │
│ 9   ┆ 20.54256  ┆ {1.990818,3.016712}  │
│ 10  ┆ 21.536857 ┆ {1.003166,11.508158} │
└─────┴───────────┴──────────────────────┘
stout-yeoman commented 5 months ago

Hey @azmyrajab,

So it does! I can confirm that the issue is indeed resolved with the latest version. Great work on the package!