desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

HPXNEST type inconsistency #1776

Closed weaverba137 closed 1 year ago

weaverba137 commented 2 years ago

In files grouped by healpixel, HPXNEST is supposed to denote that end-users should use the nested healpixel scheme. We have recently discovered that some FITS files encode the value as 'True ' (some number of spaces is standard), as opposed to T which is a proper FITS boolean. Which type is this keyword meant to have, and assuming it is meant to be a true boolean, where in the pipeline does it get converted to a string?

There is some danger that if end-users have code that interpret the HPXNEST keyword as a pure boolean, then a value of 'False ' would be interpreted as True, because strings with non-zero length are interpreted as True in Python.

sbailey commented 2 years ago

These were intended to be T (boolean), not 'True ' (string). Please point to example files that use 'True ' so that we can track down where that comes from.

weaverba137 commented 2 years ago

Here are a few examples:

fuji/healpix/special/dark/326/32637/coadd-special-dark-32637.fits
fuji/healpix/special/dark/326/32637/spectra-special-dark-32637.fits
guadalupe/healpix/main/bright/263/26371/coadd-main-bright-26371.fits
guadalupe/healpix/main/bright/263/26371/spectra-main-bright-26371.fits

these are just a few that were flagged by validating the data model. Since these files were chosen at random, it must be the case that most files in SPECPROD/healpix have this issue, such as redrock files.

The zcatalog files, zall-pix-SPECPROD and zpix-SURVEY-PROGRAM also have this issue.

geordie666 commented 2 years ago

I'm not sure I'm the correct person to generally diagnose this issue, although I'll certainly loop through the desitarget files specifically to check that the issue isn't being passed downstream from the target files.

Two immediate thoughts:

araichoor commented 2 years ago

from a quick look, I d say that those are not propagated from desitarget, and it looks like it s a matter of converting to string in the process.

this HPXNEST keyword is set here: https://github.com/desihub/desispec/blob/6f8caa05e0a39a768a93f8267f0b77d467080988/py/desispec/scripts/healpix_redshifts.py#L81-L87

and here in the desispec.scripts.tile_redshifts.write_redshift_script() function, where a headeropt string is built: https://github.com/desihub/desispec/blob/4885ad27a24a6cb08d1d3c6f3ec463302770fe89/py/desispec/scripts/tile_redshifts.py#L337-L339

that headeropt string is then is fed to desi_group_spectra, which interprets values here: https://github.com/desihub/desispec/blob/4885ad27a24a6cb08d1d3c6f3ec463302770fe89/py/desispec/scripts/group_spectra.py#L155-L165

so, maybe add another try/except case for booleans?

geordie666 commented 2 years ago

Thanks, Anand. I agree that this looks like the culprit:

if extra_header is not None: 
     for key, value in extra_header.items(): 
         headeropt += f' {key}={value}' 
araichoor commented 2 years ago

my understanding is that you ve to switch to string, as this is then fed to build a command line. but one could catch it back in group_spectra.py with an additional:

try:
    spectra.meta[key] = bool(value)
weaverba137 commented 2 years ago

I would not recommend that code, since almost everything that is not explicitly False, None, '', [], etc. is considered True. In other words, bool(value) will return True instead of throwing an exception for almost any value.

araichoor commented 2 years ago

I see, thanks for pointing that (it s quite counter-intuitive!). Maybe something like value.strip() == 'True'? you would know best..

weaverba137 commented 2 years ago

That has how Python has always worked, so counter-intuitive or not, this is extremely important to be aware of.

I think the idea that args.header is just a string is bad. You need to go back to the actual FITS header keywords as derived from astropy or fitsio, then examine their type and set the type of the keywords appropriately. Or just copy the keywords from astropy or fitsio rather than going through an intermediate stage of converting them to a string.

weaverba137 commented 1 year ago

@sbailey, this issue may be already on the way to being fixed as part of DR1 patching.

sbailey commented 1 year ago

I think the idea that args.header is just a string is bad. You need to go back to the actual FITS header keywords as derived from astropy or fitsio, then examine their type and set the type of the keywords appropriately. Or just copy the keywords from astropy or fitsio rather than going through an intermediate stage of converting them to a string.

For context, these are being parsed as strings because they are new keywords coming in via a command line option desi_group_spectra --header KEY1=VAL1 [KEY2=VAL2] ... so there is no prior information about what the type is. They are not being propagated from an input file.

I'll submit a PR soon to fix this.

weaverba137 commented 1 year ago

Ah, OK, thanx.

sbailey commented 1 year ago

Fixed by PR #2110 .