Add row-by-row extraction and CTE correction modules

schlafly commented 9 months ago

This PR adds two components to the pipeline

an "optimal" row-by-row extraction routine
a CTE correction routine

These are connected because the CTE correction routine uses the row-by-row extraction to model images. spectroperfectionism would be better but is slower (?); this should occur in a loop ~5-10 times for each CTE-affected frame and so speed was desirable. I also experimented with the simpler aperture qproc.extract routine, which was plenty fast but which did a poor job of eliminating light from neighboring spectra from contaminating target spectra. This was particularly a problem for bright tiles where we want good sky spectra (faint) but the target spectra are rather bright. We ended up on row-by-row extraction as a compromise.

The main CTE correction algorithm looks like this:

construct an image model (extract 1D spectra from 2D frame I'; filter & compute sky; project back to 2D) to get M(I')
apply CTE to get CTE(M(I'))
Improve 2D image: I' <- I' - (CTE(M(I')) - I')
Repeat.

The real challenges here are getting a good model of what CTE does to an image, and correcting for it in a way that doesn't correlate the read noise. With Nicolas Regnault's help we have a pretty decent model (though we could study this more...), and the row-by-row extraction modeling handles the second part well enough.

Here are some QA comparisons. The following PDFs have interleaved daily & this-PR-on-z1-only reductions of a lot of exposures from 20230825 - 20230906. The first in each pair of pages is daily, the second is the new reduction. https://data.desi.lbl.gov/desi/users/schlafly/misc/schlafly-cte-5-tileqa.pdf https://data.desi.lbl.gov/desi/users/schlafly/misc/schlafly-cte-5-skyzfiber.pdf To drill down more, look at the nightqa in https://data.desi.lbl.gov/desi/spectro/redux/schlafly-cte-5/nightqa/ But here's at least one comparison plot to show that when things are bad, the new code makes a big improvement: Daily: New: (only z1 has the new code active)

And finally, one plot to show that we're not increasing the noise. This just gives the median absolute deviation converted to a standard deviation of the sky fibers in all of the exposures I've run, as a function of exposure index, for z1, only in the CTE affected region. The new code is slightly better than the old code. I think this is understating the improvement---most of the issues in daily are on sky lines or individual fibers, which the clipping is doing a good job of removing---but it at least says that things aren't worse.

The optimal row-by-row extraction routine is intended to be the usual thing:

we have some traces and a PSF
we get the PSF value in each pixel (rowbyrowextract.pgh; copy of version in specter.psf.gausshermite, but with some broadcasting changes)
we get the 1D cross-dispersion profile in each pixel for each trace (rowbyrowextract.onedprofile_gh)
we model each spectral bundle / CCD row by linear least squares (build_atcinva, build_atcinvb, extract).
that's it?

Maybe a few things worth discussing in the extraction:

I worked pretty hard to make this fast. Most of the time is now in the PSF creation (onedprofile_gh) and matrix solution (linalg.solve, linalg.inv(atcinva)). It's possible there are more improvements there, but I didn't immediately find them. Extracting 500 spectra on a frame takes ~10 s.
I got annoyed about degenerate matrices and replaced all of the ivar = 0 pixels with ivar = 10^-10. This means that I get an answer for every pixel, though the errors will be very large for fully masked pixels. There's probably something I should be doing about flagging those very uncertain masked pixels.
It looks to me like we could work harder to get the traces perfectly centered, but it doesn't make a big difference. But that's the thing that jumps out in the 2D residuals. I'm just assuming the traces have already been perfectly centered here.

The CTE correction module does a few different CTE-related things.

Applying CTE to images:

apply_multiple_cte_effects: loop through the possible multiple traps on each amp, apply them in the right order to the right pixels. Currently we only have at most one per amp, but...
add_cte: adds CTE to the whole image it's given
simplified_regnault: this is the function that I actually ended up using that determines how electrons flow into and out of the trap according to how many electrons are currently in the trap, how many are in the pixel coming through, and the trap parameters.

Correcting CTE:

correct_amp: Does the core loop, adding CTE to an image, treating the difference wrt the original image as a CTE correction that needs to be applied, and iterating. This does not use a model image and so correlates the noise; we don't actually use this mode in practice.
correct_image: This loops over correct_amp for each amplifier, and also flips the read-out direction so that correct_amp doesn't need to think about this.
get_image_model: gets the model for an image used in the image-model based CTE correction. This version uses the old aperture-extraction based approach.
get_rowbyrow_image_model: gets the model for an image used in the image-model based CTE correction. This calls the row-by-row extraction routine after updating the traces. It then flattens, computes and subtracts a sky model, median filters the sky-subtracted fluxes, and constructs model extracted fluxes as sky + median filtered fluxes times the flat field. These model spectra are then converted back into a 2D image via rowbyrowextract.model.
correct_image_via_model: this does the new CTE correction that uses an image model.

Fitting CTE parameters:

chi_simplified_regnault: computes chi for a CTE model from some flat field images. Assumes we have some clean traces and some CTE-affected traces and that the two should match when the right model is applied to the clean traces.
fit_cte: calls chi_simplified_regnault using numpy nonlinear least squares routines, and sets up the appropriate clean and CTE-affected traces to compare. This only works if we don't have CTE affecting both an amp and its mirror amp. I'm sure it won't work for two amp mode, etc.
get_cte_images: tries to grab the images we need for fitting CTE for a particular night. Right now, grabs the 1s flat and the previous image.
fit_cte_night: fits the CTE parameters for a particular camera and night.

Metadata:

get_amps_and_cte: looks up the amp definitions and CTE trap locations, and the CTE parameters for the particular trap. This should some day consult the calibrations that were derived for the particular day, but right now it looks up the information in a hard coded dictionary.
ctefun_dict: this is the hard coded dictionary that the CTE transfer function information is currently looked up in. This should eventually be deleted.

schlafly commented 9 months ago

Maybe one more bit of code that could be useful but isn't in the PR proper... this is the hack I used to call my CTE correction code on existing preproc files.

I don't think we really want this---we should build it into the preproc code---but this is the easiest way to run the current code on an existing preproc image.

def process(x):
    im = desispec.io.read_image('/global/homes/s/schlafly/desiproject/spectro/redux/daily/preproc/'+x)
    if im.meta['OBSTYPE'] != 'SCIENCE':
        return
    fmap = read_fibermap('/global/homes/s/schlafly/desiproject/spectro/redux/daily/preproc/'+x)
    desispec.io.util.addkeys(fmap.meta, im.meta, skipkeys=('BZERO', 'BSCALE'))
    im.fibermap = fmap
    newim = correct_cte.correct_image_via_model(im, niter=5)
    desispec.io.write_image(x, newim)

schlafly commented 9 months ago

Here are a couple of example images showing how the trap parameters have evolved with time, on z1 and z3.

Note that when the trap amplitudes are small (e.g., less than ~50), the existing corrections are adequate and we probably don't need to turn on this special mode. Moreover, the uncertainty in the leakiness isn't a big deal as the amplitude gets lower, since the trap is just not trapping as many electrons and the correction becomes quite small.

Stephen expressed some concern about the single point on z1 with highly outlying amplitude of ~40. I haven't looked at that one case, but my sense is that I'm amazed at how well this worked with the simple scheme of just grabbing one 1 s flat and the previous image for each night. As a reminder, a lot can go wrong with the cals. It may make sense to add the fit parameters to nightly QA or compare them to the previous night's values or something to help us identify when something has gone wrong with the nightly cals. I don't know what the usual mechanisms there are.

There is also room to be concerned about the higher chi^2 / dof on pre-wildfire nights on z1. The suggestion is that the CTE looked different then and the existing model can't fit it well. I don't think we should apply the correction to that data and can certainly imagine that we'll need to develop new prescriptions in the future to address them. But the existing large CTE effects on z1, z3, r6 that are poorly corrected are all well modeled by this parameterization.

schlafly commented 9 months ago

Stephen asked me to diagnose the one night where z1 shows dramatically lower CTE than usual, night index ~ 550 or so. It turns out that that night is 20230822. This was the night where we experimented with different voltage settings on z1. We ended up not keeping the new voltages because they increased the read noise, but they did dramatically improve the CTE. i.e., this turns out not to be an actual outlier but in fact a reflection of an actual change in the CTE in that exposure.

As part of diagnosing this I computed appropriate CTE model parameters for all of the traps I know about through today and stored them here: https://data.desi.lbl.gov/desi/users/schlafly/cteparams/cteparamsall.fits These are pretty fast to compute; doing all nights from 2022/1 through today took O(15 min) on one interactive node. There's a corresponding PDF here: https://data.desi.lbl.gov/desi/users/schlafly/cteparams/cteparamtrends.pdf

It looks like at least one of the outliers on z5 is due to a particularly vertical CR. I should be able to do better with the masking there.

julienguy commented 8 months ago

Test report: I ran the CTE parameter fit on the night 20231111 for z1 and z3.

export SPECPROD=daily

then in python

from desispec.correct_cte import fit_cte_night
fit_cte_night(20231111,"z1")
# {'C': {'543:2057': (array([139.49198384,   0.20785951]), 0.5409279496512583)}}
fit_cte_night(20231111,"z3")
# {'C': {'1668:2057': (array([91.33659997,  0.22315425]), 1.0845442585069123)}}

I then edited the library code to add those parameters in order to run the code. Then wrote a script with the following lines and ran it for exposure NIGHT=20231111 EXPID=00204371 which is a 1s LED exposure.

    im = read_image(args.infile)
    fmap = read_fibermap(args.infile)
    addkeys(fmap.meta, im.meta, skipkeys=('BZERO', 'BSCALE'))
    im.fibermap = fmap
    newim = correct_image_via_model(im, niter=args.niter)
    write_image(args.outfile, newim)

z1 before: preproc-z1-00204371-before z1 after: preproc-z1-00204371-after z3 before: preproc-z3-00204371-before z3 after: preproc-z3-00204371-after

The correction is almost perfect for z1 and pretty good for z3. We may need to add more calibration data for the modeling.

schlafly commented 8 months ago

I have added some simple unit tests; take a look. They were a bit harder to make than I wanted because it doesn't look like it's easy to make a simple specter PSF object, and so I used the test PSF object that's also used in specter. And that object is rather more detailed than I had wanted in my unit tests. I would have been happy with straight traces and sum(profile) = 1 and no negative pixels in the profile, for example! Those details make sanity checks less easy.

Of particular note is that the 1D profile in specter contains "tails" which are usually zero in real DESI data. I spent a while not realizing that my model & extraction were inconsistent because I had used tails = True (the default) in onedprofile_gh and tails = False (the default) in extract, and so I was failing these tests and trying to figure out how I had screwed up so badly! This doesn't really matter in real data since the tails are zero, but not in the test data. I don't think there are any real bugs here. It would be nice if the defaults for extract(...) and onedprofile_gh(...) were the same---that would certainly minimize surprise---but on the other hand it feels like the default for onedprofile should be to do everything and for extract(...) should be to do the efficient thing, so I haven't changed the behavior here.

That said, here's what the checks do:

when the image is zero, the fluxes are zero.
the one-d profiles are not completely crazy (the center is more than the edges, the sums are near 1 in the median and don't go terribly far from 1).
We can make an image where the fluxes are all 1 at each row, extract it, and gets fluxes that are all close to 1. i.e., extract(...) and model(...) are inverses.
We can make an image where the fluxes are 10 + random noise at each row, extract it, and get fluxes that are close to the injected random spectrum. i.e., extract and model are inverses for a somewhat more complicated case.
We can add noise and get a sensible reduced chi^2. I'm kind of surprised at how well that worked; I haven't tried to get the reduction in the degrees of freedom quite right.

Does that sound adequate?

julienguy commented 8 months ago

The test looks great. I would suggest to encapsulate the functions into a class inheriting from unittest.TestCase , and a convenient function

if __name__ == '__main__':
    unittest.main()

like is done for instance in desispec/test/test_coadd.py

Please also remove the hardcoded set of CTE params from the code and add them as an argument to the main function correct_image_via_model. I will then merge the PR and start from there to integrate this code into the pipeline processing.

weaverba137 commented 8 months ago

@julienguy, @schlafly, I'm actually going suggest the opposite: do not add the extra if __name__... boilerplate to the test. That is a leftover from the days of python setup.py test. The current best practice would be to run:

pytest py/desispec/test/test_rowbyrowextract.py

You can still optionally wrap the tests in unittest.TestCase though for consistency with other tests in the package.

schlafly commented 8 months ago

In my own testing I ran just this module with Ben's invocation. The unit tests are getting run as I've written it. https://github.com/desihub/desispec/actions/runs/7463894705/job/20309610108?pr=2144#step:5:50 i.e., I can wrap it for consistency but it's not clear to me that that buys anything.

I have a version locally that I haven't pushed yet that allows the arguments to be passed by hand but falls back to the dictionary, which we can delete later. If I totally delete it I won't be able to run nights anymore.

schlafly commented 8 months ago

I went ahead and removed the hard-coded dictionary; I had forgotten that my current mechanism for running nights involved hacking the preproc files, so my objection wasn't relevant.

julienguy commented 8 months ago

OK, thanks Ben. pytest py/desispec/test/test_rowbyrowextract.py is working fine. I see the hardcoded params are removed. I am going to merge after sorting out the documentation error.

sbailey commented 8 months ago

I think the failing documentation test is because the new modules haven't been added to doc/api.rst . I think the procedure is to run the following from the top-level desispec directory:

desi_api_file --overwrite

and then commit the updated doc/api.rst file.

desihub / desispec

Add row-by-row extraction and CTE correction modules #2144