desihub / specter

A toolkit for simulating multi-object spectrographs
Other
8 stars 7 forks source link

specter linear algebra failures #50

Open sbailey opened 7 years ago

sbailey commented 7 years ago

Getting this into a ticket for debugging later:

specter unit tests have started failing at NERSC following the desiconda/20170613-1.1.4-spectro upgrade. I have not checked if it still works with earlier versions of desiconda.

On edison scratch, /scratch2/scratchdirs/sjbailey/desi/code/specter/icov-fail.fits is a dump of a matrix that fails scipy.linalg.eigh(x) with the following message:

In [11]: scipy.linalg.eigh(x)
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-11-9224ea3c9fe1> in <module>()
----> 1 scipy.linalg.eigh(x)

/global/common/edison/contrib/desi/code/desiconda/20170613-1.1.4-spectro_conda/lib/python3.5/site-packages/scipy/linalg/decomp.py in eigh(a, b, lower, eigvals_only, overwrite_a, overwrite_b, turbo, eigvals, type, check_finite)
    385                           " fortran routine." % (-info))
    386     elif info > 0 and b1 is None:
--> 387         raise LinAlgError("unrecoverable internal error.")
    388 
    389     # The algorithm failed to converge.

LinAlgError: unrecoverable internal error.

Oddly, numpy.linalg.eigh(x) works fine.

tskisner commented 7 years ago

Just another note for this: On edison we switched to using IDP rather than stock anaconda. This is due to Continuum no longer supporting edison's old OS version (IDP still supports this version of Suse Linux). If you see this error on edison but not on cori, then perhaps it is related to using Intel's version of Numpy / Scipy.

sbailey commented 7 years ago

Indeed, this problem occurs on edison but not cori. desiconda/20170613-1.1.4-spectro on cori uses scipy 0.19.0, but on edison it uses scipy 0.18.1. Both edison and cori use numpy 1.11.3.

Using anaconda instead of IDP, I do not get this error on edison with scipy/0.18.1 + numpy/1.11.1 .

On my laptop with anaconda, I do not get this error with scipy 0.18.0 and numpy 1.11.3. Detail: anaconda says that it is installing scipy 0.18.1, but scipy.__version__ reports 0.18.0.

I have not succeeded in testing IDP with a more recent release of scipy due to some conda oddness that claims to upgrade scipy but apparently doesn't:

(blat) [edison07 ~] conda list scipy
# packages in environment at /global/homes/s/sjbailey/.conda/envs/blat:
#
scipy                     0.19.0          np112py35_intel_2  [intel]  intel
(blat) [edison07 ~] which python
/global/homes/s/sjbailey/.conda/envs/blat/bin/python
(blat) [edison07 ~] python -c "import scipy; print(scipy.__version__)"
0.18.1