aalfons / robustHD

Robust methods for high-dimensional data, in particular linear model selection techniques based on least angle regression and sparse regression.
GNU General Public License v3.0
10 stars 6 forks source link

R crashes if reduce the training sample size of the nci60 example #38

Closed lulukang closed 2 years ago

lulukang commented 2 years ago

R constantly crashes if I run the following R commands using the robustHD package. This is based on your data and codes from JOSS.

data("nci60") y <- protein[, 92] correlations <- apply(gene, 2, corHuber, y) keep <- partialOrder(abs(correlations), 100, decreasing = TRUE) X <- gene[, keep] n <- nrow(X)

lambda_trim <- seq(0.01, 0.5, length.out = 10) training <- sample(1:n,size=40) testing <- setdiff(1:n, training)

fit_trim <- sparseLTS(X[training,], y[training], lambda = lambda_trim, mode = "fraction", crit = "PE”, splits = foldControl(K = 4, R = 1))

R also crashes if I increase the training data size to 50 instead of 40. I also tried some other combinations of the number of folds. It does not work.

I used it in both Windows Rx64 4.1.1 ( and macOS R 4.2.0, with and without Rstudio. There is no warning or error message and R quits.

aalfons commented 2 years ago

I think I fixed the bug. I could reproduce the crash a few weeks ago, but after sorting out an issue with environments in sparseLTS(), it now goes through on my machine (macOS, R 4.2.0):

> library("robustHD")
Loading required package: ggplot2
Loading required package: perry
Loading required package: parallel
Loading required package: robustbase
> data("nci60")
> y <- protein[, 92]
> correlations <- apply(gene, 2, corHuber, y)
> keep <- partialOrder(abs(correlations), 100, decreasing = TRUE)
> X <- gene[, keep]
> n <- nrow(X)
> lambda_trim <- seq(0.01, 0.5, length.out = 10)
> training <- sample(1:n,size=40)
> testing <- setdiff(1:n, training)
> fit_trim <- sparseLTS(X[training,], y[training], lambda = lambda_trim, mode = "fraction", crit = "PE", splits = foldControl(K = 4, R = 1))
> fit_trim

4-fold CV results:
        lambda reweighted       raw
1  0.406814826  1.8023921 1.9018965
2  0.362517211  1.4190589 1.5440126
3  0.318219597  1.3177430 1.4695792
4  0.273921983  1.0226306 1.1346821
5  0.229624368  0.9114529 1.0630557
6  0.185326754  0.8240450 0.9999025
7  0.141029140  0.7354960 0.9124275
8  0.096731525  0.8258429 0.8823456
9  0.052433911  0.7993018 0.8120302
10 0.008136297  0.6578892 0.6578892

Optimal lambda:
 reweighted         raw 
0.008136297 0.008136297 

Final model:

sparseLTS(x = X[training, ], y = y[training], lambda = 0.00813629651262027)

  (Intercept)          8502         20929          1367           607          4454 
-9.1598328579  0.3310042335  0.0863775797  0.0002349935  0.1480977384  0.1319091169 
         1106         20125          2192          8510          8119          4717 
 0.0105991949 -0.0027565131 -0.0990009605  0.2099338113  0.0549507631  0.2071540476 
         8460          8120         18447         11697         18057          1209 
-0.0371748469  0.0196347548 -0.0963591076  0.1478456541  0.0945773068  0.0646499578 
        10193         16601          7696          8706         16784          3482 
 0.0015184415  0.0396195287  0.0346881311  0.0666896146  0.0254173677  0.4564974060 

Penalty parameter:       0.008136297
Residual scale estimate: 0.100884972

@lulukang, is it possible to you to install the latest version from GitHub using


and check if it works for you now?

You'll need to install package devtools if you don't have it already, and you'll need the necessary C++ compilers for Windows (Rtools) or Mac (XCode).

aalfons commented 2 years ago

Since I no longer have this issue on my machine, it seems to be fixed in cd340ee55072221ef993f6440e3b4bb04f147b1e.

I'll therefore close this issue now. @lulukang, feel free to reopen the issue if you still have this issue on your machines.