desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

TSNR2 SCORES column potentially random order #1207

Open sbailey opened 3 years ago

sbailey commented 3 years ago

in desispec.tsnr.get_ensemble, the set of ensembles to add is found via

paths = glob.glob(dirpath + '/tsnr-ensemble-*.fits')

and the ensembles are added in that order. However, the order of glob.glob is not guaranteed and thus the TSNR2 columns could appear in the SCORES table in random order, making them a pain to stack across exposures. So far we've been "lucky" and all TSNR2 scores so far have been in the same order, but this could change in the future on a different system (Perlmutter...) or a Python upgrade.

The order could be guaranteed with

paths = sorted(glob.glob(dirpath + '/tsnr-ensemble-*.fits'))

but that will immediately break the ordering of current vs. future files.

I'll try to handle this possibility for desi_group_spectra and desi_coadd_spectra in a separate PR, but I think we should also enforce a standard TSNR2_* column order by adding a sorted(glob.glob(...)) in get_ensemble. If others agree, I think we should add that before the Denali run so that it will have the new stable sorted order going into the future.

michaelJwilson commented 3 years ago

Apologies for adding this unpredictability. I actually understood glob to be less predictable than you're suggesting.

The suggestion yields bgs, elg, lrg, lya, qso (not sure why), which seems entirely reasonable and I'd agree that it should go in for denali, assuming that's not an inconvenient amount of processing. I'd actually go further and walk back the auto discovery, given the potentially bite we might have met first time around. We might have a config. file that stipulates the order, for instance.

I can see an argument for walking back even further and relying on the afterburner providing dedicated tables updated every day, rather than writing things to disk in the cframes during procexp. The advantage to this seems to be the cadence, but it brings a number of disadvantages.