azmyrajab / polars_ols

Polars least squares extension - enables fast linear model polar expressions
MIT License
102 stars 9 forks source link

ndarray: index 0 is out of bounds for array of shape [0] #5

Closed wukan1986 closed 5 months ago

wukan1986 commented 5 months ago

My data quality is not very good, and there may be all null

import polars as pl
import polars_ols as pls  # noqa
from polars_ols.least_squares import OLSKwargs

df = pl.DataFrame({
    "A": [None, None, None, None],
    "B": [1, 2, 3, 4],
})

df = df.with_columns(pls.compute_least_squares(pl.col('A'),
                                               pl.col('B'),
                                               mode='residuals', ols_kwargs=OLSKwargs(null_policy='drop', solve_method='svd')).alias('resid'))
print(df)
"""
panicked at src/least_squares.rs:86:53:
ndarray: index 0 is out of bounds for array of shape [0]
Traceback (most recent call last):
  File "/home/kan/test1/c.py", line 10, in <module>
    df = df.with_columns(pls.compute_least_squares(pl.col('A'),
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kan/miniconda3/envs/py311/lib/python3.11/site-packages/polars/dataframe/frame.py", line 7847, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kan/miniconda3/envs/py311/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1683, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^
polars.exceptions.ComputeError: the plugin panicked
"""

I hope the resid is

null
null
null
null
azmyrajab commented 5 months ago

Thanks for raising this - let me cover the corner case of no valid data gracefully. I will let you know once this is fixed

azmyrajab commented 5 months ago

Hi @wukan1986

This should be fixed in latest commit https://github.com/azmyrajab/polars_ols/commit/567ab2d1176dbd3f965f41dec8366cf008290816, specifically 304 of expressions.rs. Will release a new version shortly once CI tests clear.

This test should now pass -

def test_all_empty_data():
    df = pl.DataFrame(
        {
            "A": [None, 2, None, 4],
            "B": [1, None, 3, None],
        }
    )
    df = df.with_columns(
        pl.col("A")
        .least_squares.ols(
            pl.col("B"),
            mode="residuals",
            null_policy="drop",
            solve_method="svd",
        )
        .alias("residuals")
    )
    assert df["residuals"].is_null().all()
azmyrajab commented 5 months ago

should be resolved now

wukan1986 commented 5 months ago

Thank you very much