aalfons / robustHD

Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression.
GNU General Public License v3.0
10 stars 6 forks source link

R crashes if reduce the training sample size of the nci60 example #38

Closed lulukang closed 2 years ago

lulukang commented 2 years ago

R constantly crashes if I run the following R commands using the robustHD package. This is based on your data and codes from JOSS.

data("nci60") y <- protein[, 92] correlations <- apply(gene, 2, corHuber, y) keep <- partialOrder(abs(correlations), 100, decreasing = TRUE) X <- gene[, keep] n <- nrow(X)

lambda_trim <- seq(0.01, 0.5, length.out = 10) training <- sample(1:n,size=40) testing <- setdiff(1:n, training)

fit_trim <- sparseLTS(X[training,], y[training], lambda = lambda_trim, mode = "fraction", crit = "PE”, splits = foldControl(K = 4, R = 1))

R also crashes if I increase the training data size to 50 instead of 40. I also tried some other combinations of the number of folds. It does not work.

I used it in both Windows Rx64 4.1.1 ( and macOS R 4.2.0, with and without Rstudio. There is no warning or error message and R quits.

aalfons commented 2 years ago

I think I fixed the bug. I could reproduce the crash a few weeks ago, but after sorting out an issue with environments in sparseLTS(), it now goes through on my machine (macOS, R 4.2.0):

> library("robustHD")
Loading required package: ggplot2
Loading required package: perry
Loading required package: parallel
Loading required package: robustbase
> 
> data("nci60")
> y <- protein[, 92]
> correlations <- apply(gene, 2, corHuber, y)
> keep <- partialOrder(abs(correlations), 100, decreasing = TRUE)
> X <- gene[, keep]
> n <- nrow(X)
> 
> lambda_trim <- seq(0.01, 0.5, length.out = 10)
> training <- sample(1:n,size=40)
> testing <- setdiff(1:n, training)
> 
> fit_trim <- sparseLTS(X[training,], y[training], lambda = lambda_trim, mode = "fraction", crit = "PE", splits = foldControl(K = 4, R = 1))
> fit_trim

4-fold CV results:
        lambda reweighted       raw
1  0.406814826  1.8023921 1.9018965
2  0.362517211  1.4190589 1.5440126
3  0.318219597  1.3177430 1.4695792
4  0.273921983  1.0226306 1.1346821
5  0.229624368  0.9114529 1.0630557
6  0.185326754  0.8240450 0.9999025
7  0.141029140  0.7354960 0.9124275
8  0.096731525  0.8258429 0.8823456
9  0.052433911  0.7993018 0.8120302
10 0.008136297  0.6578892 0.6578892

Optimal lambda:
 reweighted         raw 
0.008136297 0.008136297 

Final model:

Call:
sparseLTS(x = X[training, ], y = y[training], lambda = 0.00813629651262027)

Coefficients:
  (Intercept)          8502         20929          1367           607          4454 
-9.1598328579  0.3310042335  0.0863775797  0.0002349935  0.1480977384  0.1319091169 
         1106         20125          2192          8510          8119          4717 
 0.0105991949 -0.0027565131 -0.0990009605  0.2099338113  0.0549507631  0.2071540476 
         8460          8120         18447         11697         18057          1209 
-0.0371748469  0.0196347548 -0.0963591076  0.1478456541  0.0945773068  0.0646499578 
        10193         16601          7696          8706         16784          3482 
 0.0015184415  0.0396195287  0.0346881311  0.0666896146  0.0254173677  0.4564974060 
        11829 
 0.0159657223 

Penalty parameter:       0.008136297
Residual scale estimate: 0.100884972

@lulukang, is it possible to you to install the latest version from GitHub using

devtools::install_github("aalfons/robustHD")

and check if it works for you now?

You'll need to install package devtools if you don't have it already, and you'll need the necessary C++ compilers for Windows (Rtools) or Mac (XCode).

aalfons commented 2 years ago

Since I no longer have this issue on my machine, it seems to be fixed in cd340ee55072221ef993f6440e3b4bb04f147b1e.

I'll therefore close this issue now. @lulukang, feel free to reopen the issue if you still have this issue on your machines.