desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
33 stars 24 forks source link

extracted spectra with large inter-camera breaks leading to wrong redshifts #2193

Open moustakas opened 4 months ago

moustakas commented 4 months ago

Cross-posted from https://github.com/desihub/redrock/issues/286 with additional information requested by @sbailey.

I measure the spectral break across cameras with this bit of code: https://github.com/moustakas/fastspecfit-projects/blob/586c9341cc745e90f6ac9036fc9ca1f088ba2c07/redrock-templates/stack-templates#L3688-L3819

TL;DR: I just measure the median and robust sigma of the pixel values in the last N pixels of the blue camera, the first and last N pixels of the red camera, and the first N pixels of the NIR camera, where I take N=500. Fancier methods are of course possible.

The output catalog for the set of SV1 tiles I've been on is here: /global/cfs/cdirs/desicollab/users/ioannis/fastspecfit/redrock-templates/stacks/powerlaw-vitiles-NMF-0.5b-zscan01.fits

I then measure the S/N of the blue-red and red-NIR breaks like this

allfit = Table(fitsio.read(fitfile))
I = np.where((np.sum(allfit['MEDFLUX'] != 0, axis=1) == 4.) * (np.sum(allfit['MEDFERR'] != 0, axis=1) == 4.))[0]
fit = allfit[I]

break_cam12 = np.abs(fit['MEDFLUX'][:, 0] - fit['MEDFLUX'][:, 1])
break_cam23 = np.abs(fit['MEDFLUX'][:, 2] - fit['MEDFLUX'][:, 3])

break_sigma_cam12 = np.sqrt(fit['MEDFERR'][:, 0]**2 + fit['MEDFERR'][:, 1]**2)
break_sigma_cam23 = np.sqrt(fit['MEDFERR'][:, 2]**2 + fit['MEDFERR'][:, 3]**2)

snr12 = break_cam12 / break_sigma_cam12
snr23 = break_cam23 / break_sigma_cam23

The resulting scatter in the derived redshifts (relative to the "true" VI redshifts) vs S/N is:

image

Here, the redshift residuals are defined as

dz = C_LIGHT * np.abs(fit['Z'] - fit['VI_Z'])/(1.+fit['VI_Z'])

Selecting the ones with dz>4e3 and S/N>20 (in either inter-camera break), I get 31 objects:

TILEID FIBER      TARGETID     PETAL         DZ           SNR12      SNR23
int64  int64       int64       int64      float64        float32    float32
------ ----- ----------------- ----- ------------------ ---------- ----------
 80605  2675 39627712859478276     5  13753.30817296143   8.700717   29.26009
 80605  2687 39627712859474415     5  38854.11544979606  1.1228857  23.132587
 80605  3994 39627676687795663     7 436554.85627168004    54.0274  20.965382
 80605  4891 39627664637561841     9 32645.718999693385  1.1056823  39.386738
 80606  2679 39627712859473332     5  817464.1804844743  0.9607095  35.065628
 80606  2686 39627706823870511     5  35705.31737173035 0.88407767  29.169683
 80606  2689 39627712859477962     5 16036.041128356886  2.6505814  31.983599
 80606  2697 39627712863666614     5  21023.25121737598   3.396179  22.286367
 80607  3232 39633345029606967     6  280640.8676758417  20.559772  1.6178108
 80607  4576 39633331528143396     9 197633.31315224644  23.429968   8.169225
 80608  2680 39633354911387057     5  939649.5675124964 0.80839413  48.274033
 80608  2687 39633358170359760     5  399206.2962223898  0.6406391  34.025684
 80609  2678 39627877821449265     5  929208.7517584246  3.2488225  36.409405
 80609  2687 39627877821449301     5  54330.18166982945  3.4333866   28.68696
 80609  2703 39627865758634038     5   754652.965716283  10.944497  30.361032
 80609  3994 39627841612022977     7 481978.10488719324   51.28814  29.444307
 80609  4313 39627835576420571     8 503298.03277462174   35.11627  21.922812
 80610  2678 39627877821449509     5  467439.9591521631  1.1963993  28.193665
 80610  2679 39627877821448308     5 149707.64624787192  1.5318997  28.692732
 80610  2680 39627871785847856     5  69789.26933701255   6.789499  28.936642
 80610  2681 39627871785848127     5  307332.3223830654   7.634968   31.91988
 80610  3200 39627859731418084     6  293620.6087330402  20.561007 0.91257524
 80610  3202 39627853691618995     6 429456.14054370916  20.724264  2.0358648
 80613   329 39633317678547124     0 287634.67595368973  30.386202  5.7558904
 80613   843 39633324628509035     1 20877.824262461487   35.37756  23.553421
 80613  3166 39633351656605321     6  840471.0883434997  21.977762  1.7826381
 80613  3452 39633351652410612     6  44561.26018253542  2.1015499  23.675735
 80613  4065 39633328118170553     8 390053.23382970516  13.166382  20.160168
 80613  4188 39633328109783281     8  81577.70880682596  13.743748  63.802498
 80613  4855 39633331528139161     9  7741.116304992681  21.822968   2.575755
 80613  4891 39633331528140253     9   308855.930640626  22.210707  13.735623

Finally, here are four of these objects which I showed in my presentation on the data telecon today:

Screenshot 2024-03-12 at 6 27 49 PM
araichoor commented 4 months ago

if useful, I ve written a small script to loop on all cframes from a night, look at the sky fibers, and measure the median value of the 500 pixels on both edges of each amplifier.

I ll polish a bit that, and I ll then share more results, but here s an example for iron of one night (from early sv1, included in the bottom right spectrum of @moustakas plot above (which has fiber=2680). peramp-median-iron-20201216

I m plotting in different colors the difference at each boundary (with some y-offset for clarity). we clearly see around fiber 2700 the blue amp of the z-camera having issue, with the -0.5 offset between the r- and z-camera, as can be seen in the spectrum above. that feature is at least present from 20201214 to 20210107; so I guess that ~all spectra in that fiber range from that period have a similar r-z break.

when flipping through nights, we see some features appearing / disappearing. that approach allows one to know post-facto if some fibers from a night have some issue.

araichoor commented 4 months ago

if people are curious, I ve generated (large...) pdf files: https://data.desi.lbl.gov/desi/users/raichoor/spectro/iron/peramp-medians/peramp-median-iron-sv.pdf (74M) https://data.desi.lbl.gov/desi/users/raichoor/spectro/iron/peramp-medians/peramp-median-iron-main-y1.pdf (175M)

I currently select sky fibers with:

    sel = d["OBJTYPE"] == "SKY"
    sel &= (d["FIBERSTATUS"] & fibermask["POORPOSITION"]) == 0

but I m happy to implement something more relevant, if suggested.

I notice that for the main survey, the fiber sampling is sparser: I think it s because we then turned on the use of stuck fibers for sky; hence the ~same fibers are often use.

so I ll also explore if considering all spectra, not just sky spectra, could provide something meaningful.

sybenzvi commented 4 months ago

@sbailey noticed that we are missing a batch of Nightwatch QA from the start of SV1 so at his request I reprocessed several missing nights at NERSC: 20201216, 20201217, and 20201218. The NW calibration files covering that period were a little patchy. I did my best to clean it up quickly, but a few of the reported errors, like a wavelength (DY) offset in the trace shifts of Z8, could be spurious.

sbailey commented 4 months ago

On z5 there is a masked hot column around fiber 2663, including a large masked "bloom" at the bottom of the CDD impacting many fibers, e.g. https://nightwatch.desi.lbl.gov/20201216/00068227/preproc-z5-00068227-4x.html (thanks to @sybenzvi for generating those nightwatch pages).

image

@araichoor do your plots consider masked pixels? I suspect that much of the red/purple offset problem in your plots is coming from this CCD masked region which should have propagated into masked fluxes. From a quick look it also appears that the fiberflat is messed up but not masked in that region. I have another meeting now, but that could use more investigation too.

araichoor commented 4 months ago

"do your plots consider masked pixels?" => I exclude pixels with ivar=0, so that probably also removes the masked pixels, right?

I set flux=np.nan to all ivar=0 pixels, then use a np.nanmedian(); looks like this:

            fs, ivs = fitsio.read(fn, "FLUX"), fitsio.read(fn, "IVAR")
            fs[ivs == 0] = np.nan
            d["MEDFLUX_{}_AMPAB_BLUE".format(camera.upper())] = np.nanmedian(
                fs[:, :500], axis=1
            )
sbailey commented 4 months ago

When directly comparing flux calibrated signal spectra, I think we should focus on a noise-weighted difference comparison, rather than a comparison of medians. e.g. median( (flux1 - flux2) / sqrt(err1^2+err2^2) ) rather than (median(flux1) - median(flux2)) / error for 500 pix. That should reduce the false positive rate from spectra that have a genuine significant slope/break in the overlap. The wavelength grids are aligns such that B/R share 51 pixels of overlap, and R/Z have 126 pixels. The downside is that it is fewer pixels. Regardless, @moustakas original method identified some obvious problems to investigate.

For sky spectra it is a different story, since the correct answer is supposed to be noise about a flat flux=0, so medians of ranges of pixels might be more valid (less susceptible to true signal details, since signal=0). Thanks @araichoor for posting the multi-night analysis. More to investigate...

sbailey commented 4 months ago

Half of the examples from @moustakas's list and the prominent feature from @araichoor's plot come from z5 fibers 2675-2699. Looking at CCD data, there is a dark shadow from the saturated columns impacting many fibers. This CCD was replaced 2022-11-10, but unfortunately was used for almost 2 years. In the plot below, red are the fibers we are already masking, and cyan are additional problematic ones that aren't yet masked:

image

This isn't captured by the darks, presumably because it is a non-linear feature from saturated pixels so simply extrapolating from one exposure time to another for a feature that is variable anyway doesn't work.

Looking at the spectra, there is a clear offset and we could consider masking ~25 fibers (x-direction) for about half the wavelengths (y-direction). The white vertical band is the already masked fibers; the reddish excess to the right of that is unphysical negative flux offset.

image

The CCD feature is time-variable so we might put a custom mask night-by-night if we are trying to preserve some of these fibers. But yuck! Thanks to @julienguy for consultation on this.

araichoor commented 2 months ago

@sbailey, if useful for jura:

I ve browsed the pdf files I posted above, and I noted the following "features":

# STRENGTH NIGHTMIN NIGHTMAX APPROX_FIBER AMPS  COMMENT
small   20210514    ++          4000-4175   b_ampcd_blue/b_ampab_red    wiggles (cte?)
mild    20210204    20210224    3000-3175   z_ampcd_blue/z_ampab_red    spike
mild    20211026    20211106    3000-3500   z_ampcd_blue/z_ampab_red    offset
small   20211126    ++          2000-2175   b_ampcd_blue/b_ampab_red    wiggles (cte?)
small   20211203    20211212    0750-1000   z_ampab_blue/r_ampcd_red    offset
strong  20211212    20211212    4500-5000   r_ampab_blue/b_ampcd_red+z_ampab_blue/r_ampcd_red   spike

comments:

julienguy commented 2 months ago

The following plot shows the offset in the counts in preprocessed images to the right of the bright columns of z5 for some random dark time exposures with about 1000s exp. time.

Figure_1

There is some variation of the offset from exposure to exposure. This is the reason why it's hard to fix this. I suggest we simply mask out more fibers. Currently we have BADCOLUMNFIBERS: 2663:2675 . I suggest we extend that to BADCOLUMNFIBERS: 2663:2692, the upper limit being given by the Y1 LSS catalog mask. ( see https://data.desi.lbl.gov/desi/survey/catalogs/Y1/LSS/iron/unique_badfibers.txt )

Note the offset in the above figure is measured in CCD rows below the region used for spectroscopy. The effect is smaller in the CCD rows used for the spectral extractions.

julienguy commented 2 months ago

The change has been committed to $DESI_SPECTRO_CALIB/spec/sm9/sm9-z.yaml

sbailey commented 2 months ago

@julienguy thanks for investigating and picking the specific extended range of fibers to mask and committing that. I'm going to flag this as "done" for the purposes of Jura, but leave the ticket open in case we explore other cases (which could also spin off ot other dedicated tickets).