desi_group_spectra large number of warnings from incorrectly merging per-exposure keywords

segasai commented 6 months ago

Hi,

While using desi_group_spectra I'm getting a large number (several tens) of warnings like this

WARNING: MergeConflictWarning: Cannot merge meta key 'BBKGMAXA' types <class 'float'> and <class 'float'>, choosing BBKGMAXA=0.9176039876017302 [astropy.utils.metadata] WARNING: MergeConflictWarning: Cannot merge meta key 'BBKGMINB' types <class 'float'> and <class 'float'>, choosing BBKGMINB=-0.06447514057736013 [astropy.utils.metadata]

which I believe come from https://github.com/desihub/desispec/blob/85208f5d471c780cd4d59d32b648a0f643e1c92e/py/desispec/pixgroup.py#L727 (i.e. astropy.table.vstack) Since vstack has metatada_conflicts='silent' option, I propose to put it in, as the warnings are too numerous and frequent to be useful I think.

Comments ? S

sbailey commented 6 months ago

I agree that normal desi_group_spectra operations should not generate a gadzillion warnings. However, I think it would be better for desispec.pixgroup.frames2spectra to proactively remove header keywords that aren't meaningful after merging, rather than just silencing the warnings, which ends up picking the last value and putting that into the output, which leads to misleading/incorrect output.

That being said, the list of conflicting header keywords is rather long:

[login06 daily] grep MergeConflict $CFS/desi/spectro/redux/daily/tiles/cumulative/1132/20240313/logs/spectra-0-1132-thru20240313.log | awk '{print $7}' | sort | uniq | wc -l
212

Options:

original suggestion: set metatada_conflicts='silent' and move on with our lives; that will get rid of the warning messages and the output won't be any more incorrect than it already is
grab that list of current warnings, hardcode those keywords to be removed before stacking, deal with any new keywords that pop up in the future
inverse: identify any keywords that do not conflict after stacking, and remove any keywords that aren't in that list (i.e. propagation is opt-in rather than opt-out). Should be the same as (2) for current data, but will handle any new exposure-specific keywords that may appear in the future.
auto-identify and remove conflicting keywords before vstack. The disadvantage here is that if we a making a spectra object out of a single exposure, different keywords will propagate compared to what we get from making a spectra object out of N>1 exposures. Those keywords won't be incorrect, but it's a hassle to have a different data model for N=1 vs. N>1 exposures.

I've already spent way more time writing up this response than I was expecting so I'm warming up to the original option (1), but we should at least consider (2) or (3). I don't like (4) due to data model hassle.

sbailey commented 6 months ago

This appears to be a side effect of https://github.com/desihub/desispec/pull/2176 which added units to fibermap tables, but also propagated way more header keywords than we previously were, leading to these vstack metadata conflicts and misleading values in the coadded fibermap headers. Let's double check that, i.e. was it purposeful to propagate those extra keywords or an accidental side effect.

segasai commented 6 months ago

FWIW code like this :


def deconflict(Ts):
    collection = {}
    # collect all keywords across tables
    for curT in Ts:
        for k,v in curT.meta.items():
            if k not in collection:
                collection[k] = []
            collection[k].append(v)
    conflicted = []
    result= {}
    todel = []
    # identify keywords with more than 1 distinct value or always empty
    for k, v in collection.items():
        if len(set(v)) != 1:
            result[k] = [_ for _ in v if not isinstance(_,pyfits.card.Undefined)][0]
            conflicted.append(k)
        elif isinstance(v[0], pyfits.card.Undefined):
            # empty keywords
            todel.append(k)
    # substitute conflicted keywords and delete always empty keywords
    for curT in Ts:
        for k in conflicted:
            curT.meta[k]= result[k]
        for k in todel:
            del curT.meta[k]
    print ('conflicted keywords', conflicted)
    return Ts

when run on a couple of fibermap table, produces this output:

T1=atpy.Table().read('cframe-z9-00164680.fits')
T2=atpy.Table().read('cframe-z9-00164680.fits')
T=atpy.vstack(deconflict([T1,T2]))

conflicted keywords ['EXPID', 'COSMSPLT', 'VISITIDS', 'SEQUENCE', 'OBSERVER', 'LEAD', 'NIGHT', 'SEQSTART', 'DATE-OBS', 'TIME-OBS', 'MJD-OBS', 'STARTADJ', 'OPENSHUT', 'CLOSSHUT', 'ST', 'AIRMASS', 'FOCUS', 'TRUSTEMP', 'PMIRTEMP', 'DOMSHUTL', 'DOMEAZ', 'GUIDOFFR', 'GUIDOFFD', 'SUNRA', 'SUNDEC', 'MOONDEC', 'MOONRA', 'MOONSEP', 'MOUNTAZ', 'MOUNTDEC', 'MOUNTEL', 'MOUNTHA', 'MNTOFFD', 'MNTOFFR', 'PARALLAC', 'SKYDEC', 'SKYRA', 'TARGTDEC', 'TARGT ....] , produces no additional warnings and substitutes keywords that are different by their first (non-null) value. Whether that's exactly what's wanted I don't know. It's possible to separate separate keywords that may need different processing than picking up the first one.

sbailey commented 2 months ago

Fixed by PR #2302. Closing.

desihub / desispec

desi_group_spectra large number of warnings from incorrectly merging per-exposure keywords #2200