Missing emission lines in galaxy templates

londumas commented 6 years ago

Looking at the residuals of galaxy in DR7, DR12 and DR14 we can see that some lines are missing in the redrock galaxy templates, created from desisim galaxies. The following plots show these different lines for the three data sets. The last plot shows a line observed in the templates but not observed in the data.

The wavelength are by eye: 9533, 9071, 7137, 6301, 2496 A

DR7 dr7_galaxy_missing_lines

BOSS boss_galaxy_missing_lines

DR14 eboss_dr14_galaxy_missing_lines

eboss_dr14_galaxy_missing_lines_otherline3

londumas commented 6 years ago

Here are some high SNR examples:

plate, mjd, fiber
7275, 57093, 304
7275, 57093, 810

moustakas commented 6 years ago

The LRG templates do not have emission lines, but I haven't dropped in the ELG emission-line model because the line-ratios need to be AGN-like.
The ELG emission lines can be easily tuned. If you can report here the rest-frame wavelengths of the lines that are in the templates but not the data (+the inverse) I can look into this.

But more generally: these templates will never be as good as empirical templates that we build from the data themselves. My vision has always been that we will generate a pre-survey template set from SV observations, and that we'll periodically update our templates over the course of the survey as we probe more of galaxy/QSO parameter space. I'm sure other opinions abound, but this is a larger discussion that I would welcome.

ngbusca commented 6 years ago

@moustakas would it make sense to build them from eBOSS data? What are those empirical models like? PCA or something else?

moustakas commented 6 years ago

I think eBOSS would be a great. I hadn't previously pursued using eBOSS spectra because the requisite data weren't public, but I'm sure the public samples are large enough now.

I'm not a fan of PCA because stars are never negative. Non-negative matrix factorization or archetype methods (see, e.g., this notebook) would be preferable.

The approach I've used for the current set of (galaxy) templates is to model the available data (spectra + broadband photometry, e.g., from DEEP2 for ELGs) using stellar population synthesis models for the continuum + Gaussian emission-line fitting. This model-based but empirically constrained approach yields spectral templates with the necessary spectral resolution and wavelength coverage for DESI. Archetypes or basis spectra can then be generated from this empirically constrained parent template set, as in redrock.

But there are many other subtle issues to consider, like differences in the wavelengths of the near-UV absorption lines due to outflows / winds vs the nebular emission lines, which will show up as systematic redshift errors. However, I'm sure many of these issues have been worked on in eBOSS.

My point is that I think we need to gradually move away from the templates I built which we've been using the past N years and get ready to use "real", DESI-like data.

londumas commented 6 years ago

@moustakas, do you need only the wavelength or also the distribution of the strength of these lines?

I agree that at least we have to compare the templates with desi-like data (=eBOSS). That is why I have looked at the residuals. If they had been flat enough, we could simply continue with the current templates. Unfortunately, we are loosing sensitivity currently because some lines are missing, or added or with different distribution of strength. But it is already very good.

moustakas commented 6 years ago

At minimum the wavelengths of the lines. In principle we can tune the relative strengths but I'd be surprised if this affected the redshift results in a measurable way. But I've been known to be wrong...

londumas commented 6 years ago

@moustakas, do you need the wavelength in vacuum or in the air? In vacuum I suppose.

moustakas commented 6 years ago

Either, but vacuum is probably easiest.

londumas commented 6 years ago

In the same rest frame as the desisim galaxies: Missing: [SIII]9532 -> 9533 [SIII]9069 -> 9070 [ArIII]7138 -> 7137.7 [OI]6302 -> 6302.05

Should not exist: 2496

moustakas commented 6 years ago

(This issue should really be posted as a desisim issue.)

I never quite finished documenting how I incorporate emission lines into the ELG AND BGS templates, but here are some notes:

All the line-strengths are relative to H-beta. The hydrogen and helium recombination lines (assuming an helium-to-hydrogen abundance ratio of 0.0897) are built assuming case B and listed here. One important piece of physics missing from these emission-line ratios is intrinsic dust attenuation, but there are reasons why I have not yet included it.
The somewhat limited set of forbidden lines included in the model are listed here. The line-ratios of the doublets (e.g., [NII] 6584/6548) are set by atomic physics, while the line-ratios relative to H-beta (e.g., [NII]/H-beta) are drawn from Gaussian mixture models fitted to data from various surveys (see this notebook). I have confirmed that the forbidden emission-line sequences are broadly consistent with observations of galaxies to z~2, so they should be good for early DESI observations.
In order to add additional lines like the [SIII] 9069,9532 doublet (which depends on electron temperature), [ArIII] 7138, [OI] 6302 (which is sensitive to shocks), and other lines (like [NeIII] 3869), we just need to choose line-ratios, add them to forbidden_lines.ecsv file, and then add a few lines of code to desisim.templates.EMSpectrum (see here).

In case we want to add even more forbidden lines (about which I welcome your thoughts, @londumas), we could use galev_emlines.txt, which I obtained from R. Kotulla a few years ago, although some/many/most of the line-ratios are really only suitable for low-mass, low-metallicity starbursts.

Alternatively, FSPS also has a reasonably complete line-list.

Let me know how you'd like to proceed.

londumas commented 6 years ago

This is fixed thanks to @moustakas: https://github.com/desihub/desisim/pull/426 and https://github.com/desihub/desisim/pull/424 The new templates are in PR in https://github.com/desihub/redrock-archetypes/pull/7 and in https://github.com/desihub/redrock-templates/pull/8 Testing on some DR14 ELG+LRG truth table plate, I get:

### Master
                   ntarg   good  fail  miss  lost
BAD                  379   0.79 40.90  0.00 58.31
GALAXY              5111  96.73  0.51  1.00  1.76
QSO                   89  91.01  2.25  3.37  3.37
STAR                 302  98.68  0.66  0.00  0.66

### new templates + new archetypes
                   ntarg   good  fail  miss  lost
BAD                  379   0.79 38.52  0.00 60.69
GALAXY              5111  96.99  0.47  0.94  1.60
QSO                   89  91.01  2.25  3.37  3.37
STAR                 302  98.68  0.66  0.33  0.33

Here is what we use to have and what we get for one spectrum:

spec_old_model

spec_new_model

sbailey commented 6 years ago

Thanks for the multi-repo fix. Closing this ticket.

desihub / redrock

Missing emission lines in galaxy templates #111