RBigData / pbdBASE

Set of base classes and methods for the pbdR project.
http://r-pbd.org/
Mozilla Public License 2.0
8 stars 5 forks source link

Warnings instead of errors notifying of possibly wrong ScaLAPACK results might go unnoticed and cause undetected false results #34

Open mlell opened 6 years ago

mlell commented 6 years ago

When a singular matrix is inverted using the pbdDMAT package, an invalid result is produced, with a warning:

# Run in shell: mpirun -n 2 Rscript example2.R
pbdBASE::init.grid()
if(pbdMPI::comm.rank() == 0){
  A <- matrix(1:9,3,3) # Singular
}else{
  A <- NULL
}

dA <- pbdDMAT::as.ddmatrix(A)
dS <- pbdDMAT::solve(dA)
S <-  pbdDMAT::as.matrix(dS, proc.dest = 0)

if(pbdMPI::comm.rank() == 0){
  print(S)
}

pbdBASE::finalize()

Output when called as mpirun -n 2 Rscript example2.R

Using 2x1 for the default grid size

Warning message:
In base.rpdgetri(n = n, a = a@Data, desca = desca) :
  ScaLAPACK returned INFO=3; returned solution is likely invalid
          [,1] [,2] [,3]
[1,] 3.0000000  6.0    9
[2,] 0.3333333  2.0    4
[3,] 0.6666667  0.5    0

The base R function throws an error in this case:

solve(matrix(1:9, 3,3))
## Error in solve.default(matrix(1:9, 3, 3)) : 
##   Lapack routine dgesv: system is exactly singular: U[3,3] = 0

Because code like the above is used non-interactively and is possibly itself called many times in an automated fashion, the warning message might well be missed in certain setups. In that case false results would corrupt downstream analyses, possibly without being ever noticed, which is very dangerous.

I would therefore propose to replace the relevant comm.warning(.... lines in R/base_lm.R and R/base_scalapack.R to generate errors in that case.

wrathematics commented 6 years ago

This is a good point. The base/dmat design has some idiosyncrasies about it for some historical reasons. The current goal is to keep the base wrappers somewhat close to scalapack itself (albeit slightly higher level), which assumes you do your own error checking. I think I may instead send error codes as part of the return (or as an attribute maybe) in base and then if there's a non-zero return, error on the dmat side.