desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

Better proxies for redshift success in a SCORES HDU & Clustering Catalog redshift efficiency. #1004

Open michaelJwilson opened 4 years ago

michaelJwilson commented 4 years ago

In this tech. note:

https://desi.lbl.gov/DocDB/cgi-bin/private/RetrieveFile?docid=4723;filename=sky-monitor-mc-study-v1.pdf;version=2

@julienguy shows that an intuitive measure for redshift success - the cumulative S/N of spectral features with width < 100A, i.e. those fine enough to provide sufficient redshift precision - does indeed have a strong correlation with the redrock redshift success.

Currently, the SCORES HDU present in e.g. cframes, frames and spectra files contains the e.g. median and mean S/N etc in each arm. It'd be better to add this 'templateSNR' to the spectra files (where the measurements in all three arms are present), given that it will have a much stronger correlation with redshift success.

This is a first step in a plan to the clustering spectroscopic efficiencies, as discussed with @julienguy, @ashleyjross, :

1) Calculate this 'templateSNR' for each DESI spectrum when given any template, an FIBER-EXP-ID or a FIBER-TILE-ID. 2) Post-process spectra after redrock, calculating this 'templateSNR' for the specific redrock best-fit template. 3) Calculate the clustering spectroscopic efficiency map, described next, by calculating this 'templateSNR' for samples from an ensemble of (z, mag. and OII flux).

 _Each tile, each fiber, a redshift efficiency as a function of redshift and target magnitude, per target class. This could be 
 for instance in the form of fits images in 3D (fiber x redshift x magnitude), with one HDU per target class, and one fits 
 file per tile_

4) Use the SV redrock best-fit templates data (twice as deep), as an ensemble for each tracer e.g. ELGS. Sample (m, OII flux) points, or ensemble average, to get the tracer redshift efficiency for each fiber on an exposure in bins of redshift.

5) If necessary, fall back to the computationally less efficient approach of adding the sky / readnoise contribution to the template spectrum and run redrock, rather than appeal to this template SNR correlation. E.g. on 'difficult' plates that don't follow first order trends.

6) FIBER-EXP-ID will be masked independently, based on e.g. an ability to model the redshift efficiency on that plate.

For propagation to the random catalog and therefore clustering estimators, we first:

Match each random (z, ra, dec) to a (fiber) reachable FIBER-EXP-ID. For a given FIBER-EXP-ID random list, we add a ZSUCCESS bit for each tracer class, such that the ratio of randoms with unity to zero bit matches the zeff predicted above, for the redshift associated to the random.

TODO: document how this relates to the eBOSS approach.

michaelJwilson commented 4 years ago

I started a draft pull request that looks at a basic outline of this in a notebook.

We talked mostly about working in uncalibrated flux units (counts/angstrom). I'm now confused why we wouldn't just use the calculated IVAR to compute the template SNR in calibrated flux units - isn't this essentially the denominator to eqn. (1)? If the issue is the source IVAR contribution, is it easiest to produce an IVAR array with and without the source contribution?

In any case, it's probably worth us talking through this notebook as to where I'm heading in the right direction and where I'm not.