Reference-LAPACK / lapack

LAPACK development repository
Other
1.49k stars 434 forks source link

INFO is not propagated inside ZGESVDX #414

Open DmitryLyakh opened 4 years ago

DmitryLyakh commented 4 years ago

I was using ZGESVDX routine for incomplete SVD decomposition and noticed cases where the output matrices returned from this subroutine were zero while INFO did not show any error (status of zero). I looked at the source and apparently INFO is used in multiple internal calls inside ZGESVDX but it is never checked for an error coming from those internal calls such that there is no stop condition if an error occurs in one of them. I am wondering now whether one of the internal calls inside ZGESVDX may return a non-zero INFO while the subsequent internal call will reset it to zero, thus erasing the previously reported error and resulting in a success status from ZGESVDX (with wrong results). In any case, getting wrong output matrices (zeros) with a success status in INFO is a real correctness problem in applications using this routine. Could you please comment on this?

oamarques commented 4 years ago

Please send me a case (matrix) for which you have found the problem and I will try to reproduce it.

DmitryLyakh commented 4 years ago

The problematic matrix and other details are below. To clarify, I am using OpenBLAS from github (https://github.com/xianyi/OpenBLAS.git) which contains the lapack-netlib directory (version 3.9.0) where I believe the zgesvdx implementation is coming from (I think it is simply a copy of the NetLib LAPACK). I am running this on Ubuntu-18.04 with gcc/8.2.0.

Relevant arguments of the zgesvdx call: call zgesvdx('V','V','I',6,4,array,6,0d0,0d0,1,4,nfound,array,array,6,array,4,array,integer_lwork,array,array,info), where integer_lwork is determined by the previous dry call, nfound is 0 on return, info is zero on return, and arrays are all different and sized sufficiently (no aliases). The first question I guess is whether having nfound = 0 and info = 0 is not considered an error (then why nfound is zero if it is success)? The input matrix has dimensions (6,4):

(0.51661277109851045,-6.89963957749677833E-020) (0.20236702800339124,-1.57998582682269510E-002) (-1.11221180166257759E-009,6.17263197549497758E-012) (-0.20817768988422278,0.20817768988422275) (8.79139675480439531E-002,7.51803373368371552E-002) (-6.65371235976253624E-010,9.68787151825317375E-010) (0.51661277109851045,6.89963957749677833E-020) (-0.20236702800339126,1.57998582682269510E-002) (-9.54091751895852986E-010,-6.17263197549497758E-012) (0.20817768988422278,-0.20817768988422275) (8.79139675480439531E-002,7.51803373368371552E-002) (9.92476428255562827E-010,-6.89060512406498662E-010) (-4.46983635113560036E-171,1.16132994629522048E-173) (-3.40619081799003549E-155,2.65939232738431695E-156) (1.33071841536981804E-164,-1.03896185881676090E-165) (-3.50399441445635989E-155,3.50399441445635948E-155) (0.0000000000000000,7.82334843009561776E-172) (-1.39522370497974847E-163,1.39522370497974817E-163) (-4.46983635113560036E-171,1.16132994629522048E-173) (-3.40619081799003549E-155,2.65939232738431695E-156) (1.33071841536981804E-164,-1.03896185881676090E-165) (-3.50399441445635989E-155,3.50399441445635948E-155) (0.0000000000000000,7.82334843009561776E-172) (-1.39522370497974847E-163,1.39522370497974817E-163)

DmitryLyakh commented 4 years ago

The specific OpenBLAS commit I am using is af8a619e1fbb4e41c566453baeb3b4e523c92337. This is in case you need to compare their copied netlib-lapack source with the main netlib-lapack repo.

oamarques commented 4 years ago

Thank you, Dmitry. This week is bad. I will take a look at this next week. Osni

oamarques commented 4 years ago

Ooops. Clicked on the wrong button. Issue reopened.

DmitryLyakh commented 4 years ago

On a related note, for a different but also very small matrix (4-by-4) we are getting "internal error occurred in DBDSVDX". I am wondering what exactly this means? Is it a numerical issue or something more fundamental?

oamarques commented 4 years ago

I confirm there is a bug in the code. It is not propagating an error message (INFO>0) from DBDSVDX. The 6x4 matrix has two very small singular values, one of them is rounded to zero, and this is causing problems. The zero should not be a problem but I have to figure out what is going on.

oamarques commented 4 years ago

The problem has been identified and potentially fixed. More tests need to be performed before committing the changes.