SzymonNowakowski / DMRnet

This is a development version of DMRnet — Delete or Merge Regressors Algorithms for Linear and Logistic Model Selection and High-Dimensional Data.
1 stars 0 forks source link

`hard_case_cv_airbnb.R` fails on `DLASCL` #37

Open SzymonNowakowski opened 2 years ago

SzymonNowakowski commented 2 years ago

While running massive part of hard_case_cv_airbnb.R there was a following error (apart from many dgesdd errors caught by a try-catch block):

run 96 cvg.DMRnet with cv indexed by gic
 ** On entry to DLASCL parameter number  4 had an illegal value

For some reason it was not caught by a try-catch block. Maybe it was in predict?

Definitely for future investigation. However, since it was not related to CV, I 'll let it pass.

Additional information to reproduce this bug:

$ git log --oneline
fc1c6a8 (HEAD -> testing_branch, origin/testing_branch) removing 0.02 percent test

and

> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.2.1
SzymonNowakowski commented 2 years ago

I was able to reproduce it in Lapack 3.10.3 in dg machine. It is interesting, but this time it happened on the 95th run (?).

try-catch block doesn't catch it, it completely crashes the entire R environment.

Steps to reproduce

It seems that the matrix is quite legal - no NA values, no Inf values, full rank. But it crashes svd()

library(Rfssa)
load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_DLASCL_Lapack_3.10.3.RData")
sum(!is.finite(crashes_svd))
# [1] 0
sum(is.na(crashes_svd))
# [1] 0
qr(crashes_svd)$rank
# [1] 1928
dim(crashes_svd)
# [1] 1999 1928
svd(crashes_svd)
 ** On entry to DLASCL parameter number  4 had an illegal value
$        #the R console crashes
SzymonNowakowski commented 2 years ago

Reported in Lapack support github as Issue #743

SzymonNowakowski commented 2 years ago

It also fails the same way in dg machine (Lapack 3.10.3) with 4ce88a1 DMRnet version, f53baed airbnb.R on 134th computation of GLAMER with cv indexed by model dimension:

133 median =  0.1914894
133 df.min =  31.24812
133 lengths =  1329.805
generating train/test sets
removed 431 columns due to singular values
n= 999 p= 2234 test= 1363
Started:  1667584429
GLAMER with cv indexed by model dimension
Numerical instability in model creation in CV (cv.glamer) detected. Will skip this 1-percent set. Original error:
Unable to perform cross validation. Empty test set in one of the folds.
generating train/test sets
removed 431 columns due to singular values
n= 999 p= 2228 test= 1381
Started:  1667584472
GLAMER with cv indexed by model dimension
 ** On entry to DLASCL parameter number  4 had an illegal value