Closed wukan1986 closed 6 months ago
Thank you for the issue. I am looking into this. Not really sure why..
A quick fix can be
df.with_columns( pl.col('A').num.lstsq(*[pl.col(c) for c in df.columns if c.startswith("B_")], return_pred=True).struct.field('resid') )
and obviously you can wrap the list comprehension in a function and make this part shorter and use regex. But I know it is better to work with only Polars
It seems like
df.with_columns(pl.col('A').num.lstsq(pl.col('^B_.*$'), return_pred=True).struct.field('resid'))
is internally being dispatched as
df.with_columns(pl.col('A').num.lstsq(pl.col(x), return_pred=True).struct.field('resid')
for x in df.columns if x.startswith("B_"))
in other words, it's trying to do a A regress B_1 and then A regress B_2, etc.
There are two ways you could maybe fix it.
concat_list
which takes a Vecargs=[pl.struct(variables), pl.lit(add_bias, dtype=pl.Boolean)],
and then on the rust side, since you turn the X vars into a new DataFrame anyway, I think you can directly do that with a struct solet struct_col = &inputs[1].rename("Xvars");
let df_x = DataFrame::new(vec![struct_col.into()]).unwrap().unnest(vec!["Xvars"]).unwrap();
I think the 0th element of inputs
is the y Var and then the next one would be the struct and the [2]
spot would be the bool.
Does not support regular expressions