Open kelle opened 2 years ago
As a result of digging into this, I discovered that header data does not currently roundtrip using Spectrum1D.write
. I propose that this be implemented using something like this:
# Make a dedicated HDU for the header and spectrum
hdu0 = fits.PrimaryHDU(header=header)
hdu1 = fits.BinTableHDU(data=NIR_table)
hdu1.header['EXTNAME'] = 'SPECTRUM'
hdu1.header.set('OBJECT', NIRSpec_dict['object_name'], 'Object Name')
# Combine the two HDUs into a multi-extension FITS (MEF)
spectrum_mef = fits.HDUList([hdu0, hdu1]) # hdu0 is header and hdu1 is data
# Write out the MEF to a file
spectrum_mef.writeto(fits_filename, output_verify="exception")
# Demonstrate how to read the file back in using specutils
spec1d_NIR = Spectrum1D.read(fits_filename, format='tabular-fits')
header_NIR = fits.getheader(fits_filename)
If folks agree, I would be happy to implement this given some guidance.
@dhomeier might also be interested in this issue since he is working on:
Summarizing relevant discussions from Slack:
If someone modifies the spectrum, the keywords might become inaccurate. My proposal is to add a warning that says something like, "Header keywords are the responsibility of the user and are not guaranteed to be accurate." We can open issues and what not on specutils routines which modify header keywords and gradually make this less true. But I think that header round tripping is important enough to tolerate this shortcoming.
I am not sure how to implement this for Spectrum2D, 3D or SpectrumList. I am happy to just do it for Spectrum1D -- this would be immensely useful -- OR to figure out how to make it extendable to these other classes.
First I am not sure this issue really is a duplicate or, or completes #908 – the latter is primarily on better documentation of the existing options for writing spectra, while this one at least with the recent discussion is heading more into proposing new functionality.
I think the current "generic" read/write loaders (in fact the only ones implementing write functionality) have already been constructed with some round-tripping support in mind, though incomplete yet.
E.g. #1009 tries to improve write and read-back of the mask
and uncertainty
attributes.
One important point here is that Spectrum1D
has no header
property as such; that is rather tied to the specific FITS file layout.
The location to hold general information of that type is meta
, which could be constructed directly from header information of a FITS HDU, or be a superset of the same, or could also contain objects that are very hard or impossible to serialise to a FITS header. But round-trip of all information that can be represented as a FITS header would certainly be a reasonable target. I think the main problems here are
spectrum.meta
or creating a spectrum.meta['header']
dictionary that contains only the FITS header information. The latter is more robust as other content could be added to meta
later, but within that scheme the FITS writers would probably be restricted to writing only meta['header']
and no other metadata information (otherwise I foresee running into an infinite recursion...).meta
attributes with content most people might never look at.Spectrum1D
e.g. as flux.dtype
, flux.unit
. Still storing them in meta
in a way is redundant and may get out of sync if the data parts are modified. I think to some extent the header parsers already try to take care of that and are stripping cards like BUNIT
(which would be overwritten when writing back to a HDU anyway).A good starting point could perhaps be to review what info is already preserved in round-trip tests like https://github.com/astropy/specutils/blob/ca3153f05b8d3613bba0b865e273187e8ff176c4/specutils/tests/test_loaders.py#L608-L611
and what is missing that you would like to be saved. As sketched in your proposed implementation, getting header data back from tabular-fits
would require bypassing the Table.read
interface in the loader, as it provides to my knowledge no option for saving header data. wcs1d-fits
would already do this, but then again is inconsistent with even its own writer, since it is using the meta['header']
scheme, while both writers are trying to write back all of meta
with update_header=True
set. Again I think there has been discussion why that would be problematic to change, and break existing behaviour.
We've tested the spectrum.meta.update
method and it's pretty close! Problems include
extra END keywords. This makes many header keywords inaccessible by other tools. For example,
header=fits.getheader(tmpfile)
results in a header which just has the minimal keywords and not any of the things we added. (I assume this is because of the no-loneger-necessary END keyword)
COMMENT and HISTORY are stripped. I assume this is on purpose. Is there a way to keep them?
That's what you get with fits.getheader()
? That would be an io.fits
issue, but I cannot reproduce any stripping of COMMENT cards.
Looking back at
https://github.com/astropy/specutils/blob/ca3153f05b8d3613bba0b865e273187e8ff176c4/specutils/io/default_loaders/tabular_fits.py#L117-L127
for anything you put into a meta['header']
directory you should actually not even need the update_header
option; but from either source it should keep everything that has types it can handle, except for NAXIS*.
That's also something perhaps to be matched in wcs1d-fits
, which by default only initialises from the WCS header.
<class 'astropy.io.fits.header._HeaderCommentaryCards'>
spec.meta.update(header)
, the meta doesn't roundtrip unless I use update_header = True
. Not sure why. (This is not a big deal to me at the moment.)spec1d.meta.update(header)
, I've lost information.io.fits
or from Table
. I've posted on Slack to track this one down. I think the bottom line is that meta
doesn't maintain the functionality that one expects of FITS headers.
We could continue to go down this road or maybe I should just write a custom reader/writer that is based on tabular-fits
but expects HDU0 to have the header data.
Just to show y'all what I'm talking about, here's the degradation of the header information using the current tooling.
The header data I have compiled and added using spec1d.meta.update(header)
:
OBJECT = 'VHS1256b' / Name of observed object
RA = '194.007637' / [deg] Pointing position
DEC = '-12.957692' / [deg] Pointing position
TELESCOP= 'JWST ' / name of telescope
INSTRUME= 'NIRSpec ' / name of instrument
SPECBAND= 'NIR ' / SED.bandpass
SPEC_VAL= 2.25 / [um] Characteristic spec coord
SPEC_BW = 4.5 / [um] Width of spectrum
TDMIN1 = 0.0 / [um] Starting wavelength
TDMAX1 = 4.5 / [um] Ending wavelength
DATE = '2023-02-24' / Date of file creation
COMMENT To read this file in with specutils use Spectrum1D.read() with format =
COMMENT 'tabular-fits'
HISTORY Data Reduced by JWST pipeline version 1.7.2, CRDS version 11.16.12, CRDS
HISTORY context jwst_0977.pmap
The header obtained via meta
in the read back in file:
OrderedDict([('OBJECT', 'VHS1256b'),
('RA', '194.007637'),
('DEC', '-12.957692'),
('TELESCOP', 'JWST'),
('INSTRUME', 'NIRSpec'),
('SPECBAND', 'NIR'),
('SPEC_VAL', 2.25),
('SPEC_BW', 4.5),
('TDMIN1', 0.0),
('TDMAX1', 4.5),
('DATE', '2023-02-24')])
The header obtained using fits.getheader
:
SIMPLE = T / conforms to FITS standard
BITPIX = 8 / array data type
NAXIS = 0 / number of array dimensions
EXTEND = T
And the header directly from the FITS file
SIMPLE = T / conforms to FITS standard
BITPIX = 8 / array data type
NAXIS = 0 / number of array dimensions
EXTEND = T
END XTENSION= 'BINTABLE' / binary table extension
BITPIX = 8 / array data type
NAXIS = 2 / number of array dimensions
NAXIS1 = 24 / length of dimension 1
NAXIS2 = 10 / length of dimension 2
PCOUNT = 0 / number of group parameters
GCOUNT = 1 / number of groups
TFIELDS = 3 / number of table fields
TTYPE1 = 'wavelength'
TFORM1 = 'D '
TUNIT1 = 'um '
TTYPE2 = 'flux '
TFORM2 = 'D '
TUNIT2 = 'Jy '
TTYPE3 = 'uncertainty'
TFORM3 = 'D '
TUNIT3 = 'Jy '
OBJECT = 'VHS1256b'
RA = '194.007637'
DEC = '-12.957692'
TELESCOP= 'JWST '
INSTRUME= 'NIRSpec '
SPECBAND= 'NIR '
SPEC_VAL= 2.25
SPEC_BW = 4.5
TDMIN1 = 0.0
TDMAX1 = 4.5
DATE = '2023-02-24'
END
- In my case, the COMMENT and HISTORY cards are stripped because they are
<class 'astropy.io.fits.header._HeaderCommentaryCards'>
Ah, I did not expect that would not qualify as subtype of str
. But that's easily enough added to the filter.
- Even though I use
spec.meta.update(header)
, the meta doesn't roundtrip unless I useupdate_header = True
. Not sure why. (This is not a big deal to me at the moment.)
That's exactly as expected, as meta.update(header)
will just pull the cards directly into meta
. To have it automatically saved, you need to assign spec.meta['header'] = header
instead. This will then save only header
of course.
Then instead of having meta['OBJECT'] = 'VHS1256b'
you'd have meta['header']['OBJECT'] = 'VHS1256b'
and so on.
- The meta dictionary doesn't keep the comment that goes along with each keyword/value pair. A FITS header a list of three elements, not just two. The keyword, the value, AND the comment. So as soon as I do
spec1d.meta.update(header)
, I've lost information.
OK that's a more serious problem; I think proper treatment of comments has also been discussed a couple of times, but not sure if a good solution has come out of it.
I think the bottom line is that
meta
doesn't maintain the functionality that one expects of FITS headers.
Or, put it the other way, it has much wider functionality, so cannot be expected to conform to all FITS header conventions.
Thinking that over again, I am more and more leaning towards sticking to meta['header']
for a round-trip solution – i.e. use the default (and document it appropriately), that only that part of the meta information will be preserved through read-write operations, while other content should be treated as available for runtime only. That would leave it to the user to ensure only FITS header conforming cards go in there, although one might think of defining an actual meta.header
or Spectrum1D.header
attribute that does check this. A bit more work though...
The header obtained using
fits.getheader
:
That is just returning the PrimaryHDU header by default, so fits.getheader(1)
or fits.getheader('SPECTRUM')
should get you more complete content already.
Progress!
spec1d_NIR.meta['header'] = header
works without update_header=True
AND maintains the COMMENT and HISTORY cards. Woot! fits.getheader(fits_filename_NIR, 1)
also works! Unfortunately, SPECTRUM
does not work because the extension is not named that.This is definitely better, but I still think the meta['header'] should be maintained in hdu0 rather than the 1st extension. Is this an issue for table
?
fits.getheader(fits_filename_NIR, 1) also works! Unfortunately, SPECTRUM does not work because the extension is not named that.
Sorry, got that confused with your piece of code. No, 'tabular-fits' does not name the extension yet (only just adding that to 'wcs1d', but there it's somewhat more important to identify the different HDUs for flux, mask etc.). Still should perhaps think of a good default name.
This is definitely better, but I still think the meta['header'] should be maintained in hdu0 rather than the 1st extension. Is this an issue for table?
See Slack comment; I think table
cannot write there.
More generally, if at some point we get to write several spectrum HDUs to a single table, header info specific to the individual files may better stay in the respective HDU header.
Don't want to interrupt the above rather promising thread, but I just wanted to make sure I understand the desires here: @kelle, your goal is to make sure the "really actually 1d" spectral use case round-trips, right? I think if we can get that to work with the updates @dhomeier is doing to wcs1d-fits
that's great, and is definitely better than another writer.
Alternatively/additionally, is the underlying issue here really that it's not clear from the docs how we want someone to do this?
Yeah, I only care about the wavelength, flux, and uncertainty of one spectrum.
The docs are most certainly not clear regarding the vision for meta['header']
I'm also pretty set that I want the header in hdu0. Could also be in meta but I want a more traditional/familiar looking FITS file and header. The only route I see to do this is to make a new writer/reader.
The docs are most certainly not clear regarding the vision for meta['header']
I think the docs have never gone into that level of detail as to what is the recommended structure for metadata, although all the examples of loaders are following the meta['header']
convention (but then I think the way the generic_writer
uses it is already inconsistent with that). The docs are certainly a bit lacking in terms of information how to use the existing loaders, and what functionality they have, compared to writing a new custom loader. And for that, having >90 % of https://specutils.readthedocs.io/en/stable/custom_loading.html#creating-a-custom-loader filled with a full source listing of jwst_loader
at the moment does not appear very helpful to me.
https://specutils.readthedocs.io/en/stable/custom_loading.html#creating-a-custom-loader
I'm also pretty set that I want the header in hdu0. Could also be in meta but I want a more traditional/familiar looking FITS file and header. The only route I see to do this is to make a new writer/reader.
If it needs to output to a BINTABLE
that's out of scope of wcs1d-fits
, and I think needing to write something to the Primary HDU woul require a relatively significant extension to tabular-fits
; not sure if that still fits in.
Note that meta
in my comments so far always refers to the Spectrum1D
attribute, without any implication where and how that would be written to a file (but in FITS in practice there is little alternative to putting it into a header of some HDU).
@cshanahan1 and I are working on this at a hack day!
I'm working on writing some tests to demonstrated current and desired behavior.
Documenting some thoughts as followup from the spectro workshop hackfest last week — IIUC @kelle identified that the problem with tabular-fits dropping the metadata when writing + reading was that the writer was putting the metadata in HDU 1 along with the bintable, while the reader was loading the metadata from HDU 0. There was discussion about whether this should be fixed on the writer or the reader side.
For the purpose of the existing "tabular-fits" format, I would like to advocate for keeping the metadata in HDU 1 along with the table, and fixing this on the reader side. 3 reasons in descending order of priority:
spec = Table.read('file-written-with-tabular-fits-format.fits')
spec.meta # works!
plt.plot(spec['wavelength'], spec['flux'])
if the metadata was in HDU 0 you could still access it with additional steps, but it is handy to bundle it in the way that astropy Table.read()
and fitsio.read(filename, extension, header=True)
work.
To be clear:
I'm actively working on this right now. :)
A compromise solution is to put the metadata in both headers. That's actually easy enough.
That is not something under the control of the reader though, i.e. if the loader encounters an HDUList with non-minimal, but different headers in several HDUs, should it give preference to one, try to merge everything into one meta
dictionary, or store them in individual meta['headerN']
dictionaries (if we decide to go with that structure).
A good feature would certainly be to allow selecting the HDU via kwarg, as the proposed writer in #1105 is already supporting.
Documenting some thoughts as followup from the spectro workshop hackfest last week — IIUC @kelle identified that the problem with tabular-fits dropping the metadata when writing + reading was that the writer was putting the metadata in HDU 1 along with the bintable, while the reader was loading the metadata from HDU 0. There was discussion about whether this should be fixed on the writer or the reader side.
This is actually opposite of the way the current tabular-fits
loader is doing it – the reader is loading metadata from HDU 1 (or whichever one is specified as data HDU). There might be some confusion as some table formats can be read with this loader, but by default are read by a specialised one like MUSCLES SED
, which does read the header from the primary HDU.
Bringing this up again as the proposal in #1113 is currently to make reading metadata (only) from HDU 0 the default for tabular-fits
as well.
I'm having a lot of trouble figuring out how to write a Spectrum1D object to a file. A few of my questions:
I'm happy to work on this once I understand it. :)