abess-team / abess

Fast Best-Subset Selection Library
https://abess.readthedocs.io/
Other
474 stars 41 forks source link

Apparently Inconsistent Behaviour in LinearRegression - Expected Behaviour or Not? #502

Closed MattWenham closed 1 year ago

MattWenham commented 1 year ago

I am getting apparently inconsistent results from LinearRegression depending on how I specify the support_size.

Essentially, when using np.nonzero(model.coef_) to obtain the support set, I get inconsistent results between the following:

support_size = 1 : 'A' is chosen. support_size = 2 : 'A' and 'B' are chosen. support_size = [1,2] : 'A' and 'C' are chosen.

'A', 'B', and 'C' are all 'correct' to some extent, but one issue I am facing is that I cannot get 'C' to appear in a support set using a single value for support_size unless that value is much larger than it needs to be, in this case support_size = 18. All other arguments are their default values.

Before we delve into what might be happening, I guess I need to ask if this is expected behaviour or not?

Mamba413 commented 1 year ago

It happens because we use a warm start strategy to speed up computing.

In your example, when setting support_size = 2, the initial active set may be [A, D]. But when setting support_size=[1,2], warm start automatically opens, creating a different initial active set for support_size=2 (may be [A, E]). So, this leads to different results.

Wish that help!

MattWenham commented 1 year ago

Many thanks for your swift response. That makes sense, and I have found that using cv and a fixed support_size ensures the results are more in line with expectations.