astropy / ccdproc

Astropy affiliated package for reducing optical/IR CCD data
https://ccdproc.readthedocs.io
BSD 3-Clause "New" or "Revised" License
89 stars 87 forks source link

Handle duplicate keywords in headers in ImageFileCollection #464

Closed mwcraig closed 7 years ago

mwcraig commented 7 years ago

What happens:

If a fits file that has the same keyword more than once in the header is added to an ImageFileCollection, building the collection fails as reported in #423. @boada was on the right track in his comment https://github.com/astropy/ccdproc/issues/423#issuecomment-261349459, which illustrates that the reported issue has to do with the contents of the headers, not with the fact that the files are multi-extension. This example file from @vrooje demonstrates the issue.

In that file, the keyword OBSTYPE appears twice in the primary header, once with value 'OBJECT' and again with value 1. Base on @boada's comment https://github.com/astropy/ccdproc/issues/423#issuecomment-261351015 I'm guessing the same issue occurred there, too.

Not sure yet what the expected behavior should be in this case. 😕

Able to reproduce with astropy 1.3, ccdproc v1.2.0.

MSeifert04 commented 7 years ago

Is a FITS header allowed to have multiple identical keywords for non-comments/history cards? I mean it's possible to create them but it's nearly impossible to handle them with astropy.io.fits. Maybe the correct thing to do would be to warn and ignore or error out in those cases.

MSeifert04 commented 7 years ago

With "it's possible to create them but it's nearly impossible to handle them" I mean:

from astropy.io import fits

>>> h = fits.Header()
>>> h['obstyp'] = 'OBJECT1'
>>> h.append(('obstyp', 'OBJECT2'))
>>> h
OBSTYP  = 'OBJECT1 '                                                            
OBSTYP  = 'OBJECT2 '   
>>> h['obstyp']  # just ignores the second one!
'OBJECT1'
mwcraig commented 7 years ago

@vrooje @boada @crawfordsm @MSeifert04 Any comments on the options below would be most welcome.

For reference, the FITS standard v3 says nothing that I could find about repeated keywords (admittedly just gave it a quick skim). It does not disallow repeated keywords, I think.

A few ways I can see handling it:

  1. Raise a more useful exception. Advantage: clearer than the current situation. Disadvantage: Gives the user no way to include a header like this in an ImageFileCollection.
  2. Raise an exception by default, but allow the user to override in some way that specifies what value should be used (e.g. first encountered, last, random value chosen from the possibilities, the list of all values present, preferred dtype, ...?)
  3. Ignore these duplicate entries with a warning.

Other suggestions welcome, of course.

mwcraig commented 7 years ago

@MSeifert04 welcome to the wonderful world of FITS, in which all that is not expressly forbidden is allowed, and someone somewhere will have done it (usually for a reason that made a lot of sense at the time). 😬

boada commented 7 years ago

I know I haven't contributed much to this discussion after I pointed it out... but personally, I'd go with option number 2. It is still meaningful and provides a workaround for the user if they know what they are doing.

If there is still an interest, I bet I could find one of the files I was trying to use. I can probably strip out most of the target/personal info and make it available.

crawfordsm commented 7 years ago

There are plenty of cases where a keyword will be repeated in MEF with a different value than in the header and very good cases for this (like the data is different than in the primary extension, which might not even have any data). As such, and if the user is specifying an extension to use, I would suggest that in the case of duplicate keywords that the default behavior should raise a warning and use the keyword in the extension. However it should still read in the results.

I'd be open to a parameter such that the user can change that functionality, but my feeling is pretty strong that duplicates should not raise errors by default.

MSeifert04 commented 7 years ago

is this about a duplicate keyword in the same header or a keyword that is present once in primary and once in the extension? And in the first case: How common are such files?

mwcraig commented 7 years ago

This is about a duplicate keyword in a single extension.

The fact that it originally occurred in a MEF was just coincidence, I think.

The case of a keyword appearing in several extensions has a straightforward solution -- another column should be added to the summary table for the extension number (or name), thereby removing the ambiguity.

janga1997 commented 7 years ago

@boada There is definitely interest. I've only tested @vrooje example files to find out that the problem is with OBSTYPE, but I would like a different file to be sure.

mwcraig commented 7 years ago

@boada -- if you could confirm that OBSTYPE appears twice in the header of the first extension in one of your files that would be great. Would confirm that the issue is the same...

MSeifert04 commented 7 years ago

In general it should be really easy to check for duplicate keywords:

from collections import Counter
hdr = something
cnts = Counter(hdr.keys())
{key: n for key, n in cnts.items() if n >= 2}
boada commented 7 years ago

@mwcraig @janga1997 Here is the first two headers. I've pulled out all (hopefully) of the personal info. I don't think I am at liberty to just post the header not redacted.

> # HDU 0 in image359290.fits: > SIMPLE = T / File conforms to FITS standard > BITPIX = 8 / Bits per pixel (not used) > NAXIS = 0 / PHU contains no image matrix > EXTEND = T / File contains extensions > NEXTEND = 4 / Number of extensions > FILENAME= 'image359290.fits' / Original host filename > OBJECT = 'XXXX' / Observation title > OBSTYPE = 'object ' / Observation type > EXPTIME = 60 / [s] Exposure time > EXPCOADD= 12 / [s] Single coadd exp time > NCOADD = 5 / Number of coadds > RADECSYS= 'FK5 ' / Default coordinate system > RADECEQ = 2000. / Default equinox > RA = 'XXXX' / [h] RA of observation > DEC = 'XXXX' / [deg] DEC of observation > OBJRA = 'XXXX' / [h] RA of target > OBJDEC = 'XXXX' / [deg] DEC of target > OBJEPOCH= 2000 / [yr] Epoch of target coordinates > > TIMESYS = 'UTC ' / Time system > DATE-OBS= '2016-11-14T11:12:56.4' / Date-time of observation start (UTC) > TIME-OBS= '11:12:56.4' / Time of observation start (UTC) > MJD-OBS = 57706.46731944 / MJD of observation start > ST = '07:23:36' / [h] Siderial time > > OBSERVAT= 'KPNO ' / Observatory > TELESCOP= 'KPNO 4.0 meter telescope' / Telescope > TELRADEC= 'FK5 ' / Telescope coordinate system > TELEQUIN= 2000 / Equinox of tel coords > TELRA = 'XXXX' / [h] RA of telescope > TELDEC = 'XXXX' / [deg] DEC of telescope > HA = 'XXXX' / Telescope hour angle > ZD = 22.97 / Zenith distance > AIRMASS = 1.086 / Airmass > TELFOCUS= 10699 / Telescope focus > > INSTRUME= 'NEWFIRM ' / Mosaic detector > MOSSIZE = '[1:4096,1:4096]' / Mosaic detector size > NDETS = 4 / Number of detectors in mosaic > FILTER = 'KXs ' / Filter name(s) > > OBSERVER= 'Steven Boada' / Observer(s) > TELOP = 'Amy Robertson' / Telescope operator or observing assistants(s) > PROPOSER= 'XXXX' / Proposer(s) > PROPID = 'XXXX' / Proposal identification > OBSID = 'XXXX' / Observation ID > SEQID = 'nfkp4m_16B0174_3441' / Sequence ID > SEQNUM = 3441 / Sequence Number > EXPID = 0 / Monsoon exposure ID > NOCID = 2457707.1756379 / NEWFIRM ID > PROCTYPE= 'Raw ' / Processing type > > NOCPIE = 'XXXX' / PIs E-mail Address > NOCAOE = 'XXXX' / AOs E-mail Address > NOCOAE = ' ' / OAs E-mail Address > > NOHS = '1.1.2 ' / NOHS ID > NOCNO = 13 / observation number in this sequence > NOCORA = 0 / [arcsec] RA offset > NOCDITER= 1 / Dither iteration count > NOCSKY = 1 / sky offset modulus > NOCNPOS = 1 / observation number in requested number > NOCDPAT = 'F ' / Dither pattern > NOCDROF = -50.36 / [arcsec] Dither RA offset > NOCBIAS = 400 / Bias voltage applied > NOCFSN = 'open+4103' / Filter serial number > NOCDHS = 'DITHERSTARE' / DHS script name > NOCMPOS = 0 / Map position > NOCTIM = 12 / [s] Requested integration time > NOCCOADD= 5 / Number of coadds requested > NOCMDOF = 0 / [arcsec] Map Dec offset > NOCMREP = 0 / Map repetition count > NOCTOT = 15 / Total number of observations in set > NOCFOCUS= 10800 / [um] ntcs_focus value > NOHS = '1.1.2 ' / NOHS ID > NOCSYS = 'ctio 4m ' / system ID > NOCLAMP = 'unknown ' / Dome flat lamp status (on|off|unknown) > NOCDPOS = 13 / Dither position > NOCMITER= 0 / Map iteration count > NOCPOST = 'dfs ' / ntcs_moveto ra dec epoch > NOCODEC = 30 / [arcsec] Dec offset > NOCMPAT = '4Q ' / Map pattern > NOCDDOF = -4.71 / [arcsec] Dither Dec offset > NOCSCR = 'XXXX' / NOHS script run > NOCNUM = 1 / observation number request > NOCMROF = 0 / [arcsec] Map RA offset > NOCDGAVG= 4 / Number of digital averages requested > NOCTYP = 'DITHERSTARE' / Observation type > NOCFSMPL= 1 / Number of fSamples requested > NOCDREP = 1 / Dither repetition count > > NFOSSTMP= 64.997002 / [K] oss temp measured > NFDETTMP= 30.004999 / [K] detector array temp measured > > NFFILPOS= 'KXs ' / Filter detected position name > NFFW2POS= 3 / Wheel 2 actual position (0|1|2|3|4|5|6|7|8) > NFFW1POS= 8 / Wheel 1 actual position (0|1|2|3|4|5|6|7|8) > > DECDIFF = 449.93 / [arcsec] Dec diff > AZ = 'XXXX' / Telescope azimuth > TCPTRACK= 'off [e] ' / telescope tracking status > RAINST = 0 / [arcsec] RA instrument center > DECOFF = 449.929993 / [arcsec] Dec offset > RAOFF = 402.950012 / [arcsec] RA offset > RAINDEX = 0 / [arcsec] RA index > ALT = 'XXXX' / Telescope altitude > RAZERO = 0 / [arcsec] RA zero > DECINST = 0 / [arcsec] Dec instrument center > RADIFF = 38.86 / [arcsec] RA diff > DECZERO = 0 / [arcsec] Dec zero > DECINDEX= 0 / [arcsec] Dec index > > NFC1POS = '00:00:00.00 00:00:00.0 2007 [s]' / Camera 1 target (HH:MM:SS.SS DD:MM > NFC1GDR = 'off [s] ' / Camera 1 guider mode > NFC1FILT= '0 [s] ' / Camera 1 filter > > DOMEERR = '0 [s] ' / [deg] Dome error as distance from target > DOMEAZ = '0 [s] ' / [deg] Dome position > > TCPGDR = 'on ' / Guider status (on|off|lock) > > NFC2FILT= '0 [s] ' / Camera 2 filter > NFC2POS = '00:00:00.00 00:00:00.0 2007 [s]' / Camera 2 target (HH:MM:SS.SS DD:MM > NFC2GDR = 'off [s] ' / Camera 2 guider mode > > NFECPOS = 'open ' / detected position (open|close|between) > LAMPSTAT= 'A ' / Lamp status > CHECKSUM= 'gef4ieZ3ged3geZ3' / HDU checksum updated 2017-01-20T17:42:11 > DATASUM = ' 0' / data unit checksum updated 2017-01-20T17:42:11 > > > # HDU 1 in image359290.fits: > XTENSION= 'IMAGE ' / Extension type > BITPIX = 32 / Bits per pixel > NAXIS = 2 / Number of image axes > NAXIS1 = 2112 / Length of axis 1 > NAXIS2 = 2048 / Length of axis 2 > PCOUNT = 0 / Number of bytes following image matrix > GCOUNT = 1 / Number of groups > EXTNAME = 'im1 ' / Extension name > INHERIT = T / Inherits global header > EXTVER = 1 / Extension version > IMAGEID = 1 / Image identification > OBSID = 'kp4m.20161114T111256' / Observation ID > EXPID = 0 / Monsoon exposure ID > NOCID = 2457707.1756379 / NEWFIRM ID > > OBJECT = 'XXXX' / Observation title > OBSTYPE = 'object ' / Observation type > EXPTIME = 60 / [s] Exposure time > EXPCOADD= 12 / [s] Single coadd exp time > NCOADD = 5 / Number of coadds > FILTER = 'KXs ' / Filter name(s) > > RA = 'XXXX' / [h] RA of observation > DEC = 'XXXX' / [deg] DEC of observation > RADECEQ = 2000. / Default equinox > TIMESYS = 'UTC ' / Time system > DATE-OBS= '2016-11-14T11:12:56.4' / Date-time of observation start (UTC) > TIME-OBS= '11:12:56.4' / Time of observation start (UTC) > MJD-OBS = 57706.46731944 / MJD of observation start > MJDSTART= 57706.4673198 / MJD of observation start > MJDEND = 57706.468085 / MJD of observation end > ST = '07:23:36' / [h] Siderial time > > BUNIT = 'adu ' / Pixel value units > GAIN = 8 / Nominal gain > INSTRUME= 'NEWFIRM ' / Mosaic detector > DETECTOR= 'SN019 ' / Array name > DETSIZE = '[1:2046,1:2046]' / Detector size > DETSEC = '[1:2046,1:2046]' / Detector section > DATASEC = '[2:2047,2:2047]' / Data section > TRIMSEC = '[2:2047,2:2047]' / Trim section > BIASSEC = '[2049:2112,2:2047]' / Bias section > > LTM1_1 = 1.0 / Detector to image transformation > LTM2_2 = 1.0 / Detector to image transformation > LTV1 = 1.0 / Detector to image transformation > LTV2 = 1.0 / Detector to image transformation > > WCSASTRM= 'kp4m.20080213T073546 (USNO F J) by F. Valdes 2008-02-14' / WCS Source > EQUINOX = 2000. / Equinox of WCS > WCSAXES = 2 / WCS dimensionality > CTYPE1 = 'RA---TNX' / Coordinate type > CTYPE2 = 'DEC--TNX' / Coordinate type > CRVAL1 = 0.0 / [deg] Axis 1 ref. coordinate > CRVAL2 = 0.0 / [deg] Axis 2 ref. coordiante > CRPIX1 = 2100.3522150252 / [pixel] Axis 1 ref. pixel > CRPIX2 = 2097.3518840402 / [pixel] Axis 1 ref. pixel > CD1_1 = -0.0001098030586219 / Coordinate matrix > CD2_1 = -6.5838416476644E-07 / Coordinate matrix > CD1_2 = 1.9872110097583E-07 / Coordinate matrix > CD2_2 = 0.00010981299297333 / Coordinate matrix > WAT0_001= 'system=image' / Coordinate system > WAT1_001= 'wtype=tnx axtype=ra lngcor = "3. 4. 4. 2. 0.005222239643447366 0.230' > WAT1_002= '3958827361763 -0.2300630522335702 -0.003928143231877321 -0.001114087' > WAT1_003= '145613971 0.01884323423028653 -0.06406359383927175 -0.02905742131005' > WAT1_004= '108 -0.008048497759860484 0.06452382062283505 -0.2159030032845992 -0' > WAT1_005= '.02625288616847394 -0.03427527668789979 -0.05717312101539648 " ' > WAT2_001= 'wtype=tnx axtype=dec latcor = "3. 4. 4. 2. 0.005222239643447366 0.23' > WAT2_002= '03958827361763 -0.2300630522335702 -0.003928143231877321 0.001100774' > WAT2_003= '783598311 -0.009430451856139785 0.04251163389849088 -0.0923869579146' > WAT2_004= '0815 0.01886049780937818 -0.06221944192920887 -0.01168087592751612 0' > WAT2_005= '.06623702942513309 -0.1876219133831757 -0.02060419838613524 " ' > > BPM = 'nfdat$nfbpm[im1]' / Bad pixel mask > > FSAMPLE = 1 / No. of fowler samples > DIGAVGS = 4 / No. of digital average samples > CHECKSUM= 'bT3acR1YbR1abR1W' / HDU checksum updated 2017-03-10T18:09:05 > DATASUM = '3145234957' / data unit checksum updated 2017-03-10T18:09:05

should also be said I don't really remember which file I was working with when I initially saw this. This is almost certainly not it.

boada commented 7 years ago

@MSeifert04

In [1]: from astropy.io import fits In [2]: f = fits.open('image359290.fits') In [3]: hdr = f[0].header In [4]: from collections import Counter In [5]: cnts = Counter(hdr.keys()) ...: {key: n for key, n in cnts.items() if n >= 2} ...: Out[5]: {'': 29, 'NOHS': 2}

boada commented 7 years ago

n is just an integer and doesn't have a length.

crawfordsm commented 7 years ago

Sorry, for my confusion, but yes, the use case for same keyword appearing twice is a lot less obvious. Actually, I can't think of a good one and it would appear in @boada case that it wasn't intentional.

If it is the same value appearing twice, I would probably go with the option of ignoring it (as in this case).

If it is a different value, I would probably go with the astropy.io.fits case of ignoring the second one but issuing a warning.

That way the code will work regardless of what is in the header.

MSeifert04 commented 7 years ago

Shouldn't that be len(n) > 2

@janga1997 What do you mean? That >= 2 is to ignore all keywords that appear only once.

If it is a different value, I would probably go with the astropy.io.fits case of ignoring the second one but issuing a warning.

👍

janga1997 commented 7 years ago

@MSeifert04 Sorry for that comment, I didn't read yours properly before commenting. And I deleted my comment immediately.

vrooje commented 7 years ago

I see the same keyword appearing twice a lot, especially with multiple uses of the HISTORY keyword, where there's one entry for each change that was made during data reduction, e.g. overscan subtracted, debiasing completed, flat-fielded, etc

Is that the kind of duplication we're talking about?

MSeifert04 commented 7 years ago

@vrooje It's about duplicate keywords that are not HISTORY or COMMENT. These two already work correctly with ImageFileCollection and astropy.io.fits.Header.

SaOgaz commented 7 years ago

@vrooje, more specifically, HISTORY and COMMENT are not "normal" keywords, per the FITS standard but are "Commentary keywords". "Blank" keywords are also included as commentary keywords.

mwcraig commented 7 years ago

The astropy FITS documentation says this:

Most keywords in a FITS header have unique names. If there are more than two cards sharing the same name, it is the first one accessed when referred by name. The duplicates can only be accessed by numeric indexing.

There are three special keywords (their associated cards are sometimes referred to as commentary cards), which commonly appear in FITS headers more than once. They are (1) blank keyword, (2) HISTORY, and (3) COMMENT. Unlike other keywords, when accessing these keywords they are returned as a list:

So like @crawfordsm said, keep the first value, ignore the rest. Raising a warning seems like a good idea.

saimn commented 7 years ago

@MSeifert04 : about https://github.com/astropy/ccdproc/issues/464#issuecomment-285732761, yes it's possible!

In [4]: h[('obstyp', 0)]
Out[4]: 'OBJECT1'

In [5]: h[('obstyp', 1)]
Out[5]: 'OBJECT2'

I must admit I didn't know this before looking at the "compressed header" issue (#5866) . Oh, and from @mwcraig's comment above it's valid, FITS is so fun 😄

janga1997 commented 7 years ago

@mwcraig @crawfordsm I would like to work on this issue. Have we reached a consensus on a solution for this issue? That is , keeping the first value, ignoring the second, and raising a warning?

MSeifert04 commented 7 years ago

@saimn Yeah, it's not only fun, it's also full of surprises 😄