abess-team / abess

Fast Best-Subset Selection Library
https://abess.readthedocs.io/
Other
474 stars 41 forks source link

All types of tune go wrong in PCA model #484

Closed bbayukari closed 1 year ago

bbayukari commented 1 year ago

Describe the bug

All types of tune go wrong in PCA model, include "gic", "aic", "bic", "ebic" and "cv". Specifically, all information metric methods return 0; the result of "cv" method monotonically decreases as support_size increases so it's useless for selecting support_size.

Code for Reproduction

n <- 10000
p <- 5
support_size <- 3
dataset <- generate.spc.matrix(n, p, support_size, snr = 100, seed = 1)

for(ic_type in c("gic", "aic", "bic", "ebic")){
  spca_fit <- abesspca(dataset[["x"]], tune.type = ic_type)
  if(all(spca_fit[["tune.value"]] == 0)){
    print(sprintf("tune.value of %s is all zero!", ic_type))
  } 
}

spca_fit2 <- abesspca(dataset[["x"]], tune.type = "cv")
if(!is.unsorted(-spca_fit2[["tune.value"]])){
  print("tune.value of cv is sorted!")
}

Results:

[1] "tune.value of gic is all zero!"
[1] "tune.value of aic is all zero!"
[1] "tune.value of bic is all zero!"
[1] "tune.value of ebic is all zero!"
[1] "tune.value of cv is sorted!"

Desktop (please complete the following information):

Mamba413 commented 1 year ago

I think this bug is relate to a recent pull request.https://github.com/abess-team/abess/pull/477#issue-1561278366

oooo26 commented 1 year ago

Yep, there is a change here:

However, CV may not be a good strategy in PCA. (As you found, it will return the largest sparsity)

Mamba413 commented 1 year ago

I believe this issue has been addressed. So I will remove the bug label.