Closed mwcraig closed 7 years ago
Is a FITS header allowed to have multiple identical keywords for non-comments/history cards? I mean it's possible to create them but it's nearly impossible to handle them with astropy.io.fits
. Maybe the correct thing to do would be to warn and ignore or error out in those cases.
With "it's possible to create them but it's nearly impossible to handle them" I mean:
from astropy.io import fits
>>> h = fits.Header()
>>> h['obstyp'] = 'OBJECT1'
>>> h.append(('obstyp', 'OBJECT2'))
>>> h
OBSTYP = 'OBJECT1 '
OBSTYP = 'OBJECT2 '
>>> h['obstyp'] # just ignores the second one!
'OBJECT1'
@vrooje @boada @crawfordsm @MSeifert04 Any comments on the options below would be most welcome.
For reference, the FITS standard v3 says nothing that I could find about repeated keywords (admittedly just gave it a quick skim). It does not disallow repeated keywords, I think.
A few ways I can see handling it:
ImageFileCollection
.Other suggestions welcome, of course.
@MSeifert04 welcome to the wonderful world of FITS, in which all that is not expressly forbidden is allowed, and someone somewhere will have done it (usually for a reason that made a lot of sense at the time). 😬
I know I haven't contributed much to this discussion after I pointed it out... but personally, I'd go with option number 2. It is still meaningful and provides a workaround for the user if they know what they are doing.
If there is still an interest, I bet I could find one of the files I was trying to use. I can probably strip out most of the target/personal info and make it available.
There are plenty of cases where a keyword will be repeated in MEF with a different value than in the header and very good cases for this (like the data is different than in the primary extension, which might not even have any data). As such, and if the user is specifying an extension to use, I would suggest that in the case of duplicate keywords that the default behavior should raise a warning and use the keyword in the extension. However it should still read in the results.
I'd be open to a parameter such that the user can change that functionality, but my feeling is pretty strong that duplicates should not raise errors by default.
is this about a duplicate keyword in the same header or a keyword that is present once in primary and once in the extension? And in the first case: How common are such files?
This is about a duplicate keyword in a single extension.
The fact that it originally occurred in a MEF was just coincidence, I think.
The case of a keyword appearing in several extensions has a straightforward solution -- another column should be added to the summary table for the extension number (or name), thereby removing the ambiguity.
@boada There is definitely interest. I've only tested @vrooje example files to find out that the problem is with OBSTYPE
, but I would like a different file to be sure.
@boada -- if you could confirm that OBSTYPE
appears twice in the header of the first extension in one of your files that would be great. Would confirm that the issue is the same...
In general it should be really easy to check for duplicate keywords:
from collections import Counter
hdr = something
cnts = Counter(hdr.keys())
{key: n for key, n in cnts.items() if n >= 2}
@mwcraig @janga1997 Here is the first two headers. I've pulled out all (hopefully) of the personal info. I don't think I am at liberty to just post the header not redacted.
should also be said I don't really remember which file I was working with when I initially saw this. This is almost certainly not it.
@MSeifert04
In [1]: from astropy.io import fits In [2]: f = fits.open('image359290.fits') In [3]: hdr = f[0].header In [4]: from collections import Counter In [5]: cnts = Counter(hdr.keys()) ...: {key: n for key, n in cnts.items() if n >= 2} ...: Out[5]: {'': 29, 'NOHS': 2}
n is just an integer and doesn't have a length.
Sorry, for my confusion, but yes, the use case for same keyword appearing twice is a lot less obvious. Actually, I can't think of a good one and it would appear in @boada case that it wasn't intentional.
If it is the same value appearing twice, I would probably go with the option of ignoring it (as in this case).
If it is a different value, I would probably go with the astropy.io.fits
case of ignoring the second one but issuing a warning.
That way the code will work regardless of what is in the header.
Shouldn't that be len(n) > 2
@janga1997 What do you mean? That >= 2
is to ignore all keywords that appear only once.
If it is a different value, I would probably go with the
astropy.io.fits
case of ignoring the second one but issuing a warning.
👍
@MSeifert04 Sorry for that comment, I didn't read yours properly before commenting. And I deleted my comment immediately.
I see the same keyword appearing twice a lot, especially with multiple uses of the HISTORY
keyword, where there's one entry for each change that was made during data reduction, e.g. overscan subtracted
, debiasing completed
, flat-fielded
, etc
Is that the kind of duplication we're talking about?
@vrooje It's about duplicate keywords that are not HISTORY
or COMMENT
. These two already work correctly with ImageFileCollection
and astropy.io.fits.Header
.
@vrooje, more specifically, HISTORY and COMMENT are not "normal" keywords, per the FITS standard but are "Commentary keywords". "Blank" keywords are also included as commentary keywords.
The astropy FITS documentation says this:
Most keywords in a FITS header have unique names. If there are more than two cards sharing the same name, it is the first one accessed when referred by name. The duplicates can only be accessed by numeric indexing.
There are three special keywords (their associated cards are sometimes referred to as commentary cards), which commonly appear in FITS headers more than once. They are (1) blank keyword, (2) HISTORY, and (3) COMMENT. Unlike other keywords, when accessing these keywords they are returned as a list:
So like @crawfordsm said, keep the first value, ignore the rest. Raising a warning seems like a good idea.
@MSeifert04 : about https://github.com/astropy/ccdproc/issues/464#issuecomment-285732761, yes it's possible!
In [4]: h[('obstyp', 0)]
Out[4]: 'OBJECT1'
In [5]: h[('obstyp', 1)]
Out[5]: 'OBJECT2'
I must admit I didn't know this before looking at the "compressed header" issue (#5866) . Oh, and from @mwcraig's comment above it's valid, FITS is so fun 😄
@mwcraig @crawfordsm I would like to work on this issue. Have we reached a consensus on a solution for this issue? That is , keeping the first value, ignoring the second, and raising a warning?
@saimn Yeah, it's not only fun, it's also full of surprises 😄
What happens:
If a fits file that has the same keyword more than once in the header is added to an
ImageFileCollection
, building the collection fails as reported in #423. @boada was on the right track in his comment https://github.com/astropy/ccdproc/issues/423#issuecomment-261349459, which illustrates that the reported issue has to do with the contents of the headers, not with the fact that the files are multi-extension. This example file from @vrooje demonstrates the issue.In that file, the keyword
OBSTYPE
appears twice in the primary header, once with value'OBJECT'
and again with value1
. Base on @boada's comment https://github.com/astropy/ccdproc/issues/423#issuecomment-261351015 I'm guessing the same issue occurred there, too.Not sure yet what the expected behavior should be in this case. 😕
Able to reproduce with astropy 1.3, ccdproc v1.2.0.