Open dmargala opened 3 years ago
For consideration, does it help to insert icov = 0.5*(icov + icov.T)
to ensure exact symmetry before calling flux = solve(icov, y)
? specter does that in specter.ex2d.resolution_from_icov
line 533 to provide some robustness to rounding errors resulting in slightly non-symmetric matrices. It doesn't apply that earlier in the code at the point scipy.sparse.linalg.spsolve
is called, so it isn't clear to me if that is really needed or if it was added while chasing a red herring.
I think it's worth trying in gpu_specter to see if it helps or significantly changes the situation. Otherwise at its core it looks like we need more regularization for ill-constrained patches to have better matrix conditioning.
I have tried experimenting with symmetrization trick but I think there is a bigger issue at play.
Here is an interesting example from:
night = '20210629'
expid = 96676
camera = 'r3'
specrange = (365, 369)
waverange = (7160, 7199.2)
it looks like there is a spectra that is completely missing in that patch. Quite a challenge for extraction!
solve(icov, y) for that missing spectrum
The overlapping zeros in this plot show that the solution is going to be poorly constrained:
In specter, spsolve doesn't return nan's on that patch but the result is less than ideal:
I processed 498 exposures (379 night-tiles) from 2021-04 with gpu_specter.
There were 772 patches with flux=NaN due to silent failure at the Cholesky solve step (icov is not positive definite). The NaNs are eventually zeroed out before writing to disk.
I hacked gpu_specter to save an image of the patches when extraction fails. It looks like nearly all the affected patches have several masked pixels. See:
https://data.desi.lbl.gov/desi/spectro/redux/dmargala/43429/nan/nan-table.html
For reference, here is a simplified version of the extraction code:
where
p
is the pixel vector,image.pix
,Ninv
is the inverse pixel noise matrix,diag(image.ivar*(image.mask == 0))
, andA
is the projection matrix. In practice, the diagonal entries oficov
are modified when the corresponding pixel weights are below some threshold.For one of the failures, I compared the
icov
andy
between specter and gpu_specter(gpu). They are mostly similar, not identical because patch padding is a little different, but I don't see anything obvious out of place between the two. Theicov
from specter is not positive definite as well so the likely culprit is that the Cholesky solve in gpu_specter (cupyx.lapack.posv
) is less forgiving then the solve in specter (scipy.sparse.linalg.spsolve
).Cholesky solve is used in gpu_specter because there is a batched implementation in CuPy/CUDA cuSolver which allows an entire sub-bundle of patches to be extracted in parallel on the GPU.
Note that there are NaNs from specter extraction in the everest logs as well but they are not as common. I count 10 from specter vs 772 from gpu_specter for 2021-04 data: