Closed sbailey closed 1 year ago
An additional idea: desitarget.io.write_with_units() automatically looks up the units for common column names from a yaml file stored in a data directory.
desitarget.io.write_with_units()
has some frailties. It was only designed to work with fitsio
and, consequently, some of the units stored in the yaml file aren't strictly FITS compliant. So, I certainly wouldn't copy it blindly.
But, I found the idea of having a database of units pretty slick (if I'd actually been smart enough to get the units correct!) because I didn't have to remember the unit for a common column name like PMRA
or PARALLAX_IVAR
, or include the unit every time I wrote RA
/DEC
/etc.
I really like @geordie666's solution, and if I had thought of it (or known about it) I would have implemented something like that for FastSpecFit
!
This seems reasonably straightforward, but there is a technical detail related to comments that should be mentioned. There are two possible ways to add a comment to a column in a binary table. There's the way everyone probably thinks of immediately:
TTYPE01 = 'Z ' / Redshift
and then there is:
TTYPE01 = 'Z '
TCOMM01 = 'Redshift'
The latter is probably what we actually want, because there is a known issue with fitsio
related to comments on TTYPExx
header keywords.
In terms of strict
ness, I think it would be more reasonable to emit a warning if a column doesn't exist in the FITS file, rather than raise an error. After all, if there is no column to add units or comments to, then the function can simply do nothing, and that's probably OK.
In terms of where to read the units and descriptions from, I think that with the ongoing effort to create the column_descriptions file in the desidatamodel repository, maybe it would make sense to read these informations from there, having a centralized document for all the DESI columns.
Also, a comment about the function interface, in astropy.io.fits, the column in the header associated with units is TUNITnn, not BUNITnn
In terms of where to read the units and descriptions from, I think that with the ongoing effort to create the column_descriptions file in the desidatamodel repository, maybe it would make sense to read these informations from there, having a centralized document for all the DESI columns.
This shouldn't be the final word, but as the developer of a code (FastSpecFit
) which depends on some but not all of the desihub software, I would be more in favor of putting the unit definitions in desiutil
, which is a critical dependency (unlike desidatamodel
, which I really don't want to have to bring in). And if the decision is made to put these definitions in desiutil
, I'd be happy to contribute by including a wider range of non-standard units like equivalent widths, etc.
I agree with @moustakas. The dependency should be the other-way-around. i.e., if anything, desidatamodel
would have a dependency on the unit tables in desiutil
.
I think it would be fine to copy across the unit tables from desitarget
as a first step, fixing the non-FITS-compliant units. A possible test for how to fix those non-compliant units is noted at the bottom of the desitarget
data model page.
How difficult would it be to move the column descriptions work to desiutil and work to merge everything there?
Cool! Sounds readonable to have it in desiutil. Actually, the effort in the datamodel was because from the beggining units and descriptions were not present in the fits files and filling those by hand was very tiresome, but if we develop tools in utils that add/check those directly to the fits files, datamodel would directly inherit those infos from the fits files.
What Im in favour is to have just one centralized description file for this, and at the moment, the one in datamodel is more complete, so we should merge both.
El mar, 28 mar 2023, 17:01, ashleyjross @.***> escribió:
How difficult would it be to move the column descriptions work to desiutil and work to merge everything there?
— Reply to this email directly, view it on GitHub https://github.com/desihub/desiutil/issues/191#issuecomment-1487200782, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXSAV35M7RZFRVP7USEX2LW6MDM5ANCNFSM6AAAAAAWF3RQMQ . You are receiving this because you commented.Message ID: @.***>
It feels like "the better is the enemy of the good" has impacted this ticket. i.e. it would quite helpful to have utility scripts/functions to add units to FITS tables (the good), even if we don't yet have fully automated support for propagating units of standard column names (the better). And even if/when we implement "the better", we still need "the good" for cases that aren't covered by the default list of column:units.
I recommend that in desiutil we concentrate purely on the low-level adding units functionality. Desidatamodel can inherit from that and convert the column:units definitions inheriting the low level functionality from desiutil.
We need this for post-facto patching DR1 redshift catalogs, so I'm going to add it to the DR1 Patching project.
Corner cases to consider:
LONGSTRN
keyword.I've discovered two problems: astropy/astropy#15255 and astropy/astropy#15256. Although these involve different types of FITS HDUs, the effect is the same: it may not be possible to create a byte-for-byte exact copy of a FITS file when copying HDUs from one file to another.
Why is this so serious? Presumably, when we are modifying the units in a HDU, we absolutely do not want to modify any other HDU. But these bugs indicate that other HDUs may be modified as a side effect.
I have a workaround for astropy/astropy#15256. It may be possible to develop a workaround for the other bug, but I just want to prepare the ground in case I cannot, or it takes longer than a few days.
OK, thanks for identifying those gotchas. For the moment, let's shift focus from the original "add units to a pre-existing file" to making it easier to create the unit-ful file in the first place with a function for "add units to a pre-existing table" that would be used before writing the file in the first place. That second case (adding units to a Table based on column names) is also what we most immediately need for updating desi_zcatalog to re-add units while writing the summary catalogs.
i.e. add the glue to parse an input file like column_descriptions.csv and use that to add units to table columns, while providing options for what to do if a given column already has units.
Bonus points if you can also propagate the descriptions. I don't know if/how those are internally handled by astropy Tables for whether that is an easy or hard task.
OK, that should be relatively easy, at least for units.
I was already developing the functionality to add comments (i.e. the descriptions) to FITS files. I'm not sure how that works for Tables.
In other news, it turns out that even though the TCOMMxx
header card is used, it is not (yet) part of the FITS standard, so we might have to fall back on using header card style comments:
TTYPE01 = 'Z ' / Redshift
Further technical details on the relationship between FITS binary tables and Astropy Tables:
When Table.read()
reads a FITS binary table (BinTableHDU
), the keywords for the HDU are preserved in the .meta
attribute, along with their values. However, all FITS header comments are lost.
This means that round-tripping Table.read()
--> add units --> somehow add comments --> Table.write()
will not add comments. One would have to add units to the table, then convert the table to a BinTableHDU
, then add comments.
I haven't tested this with the latest versions of astropy
, but another issue I tripped up on with FastSpecFit
is that calling astropy.table.vstack
on two astropy.table.QTable
s which have units does not preserve the units.
In the end what I do is to add the units right before writing out.
Please ignore if this comment isn't helpful.
Have you reported that issue to astropy? If so, what is the issue number?
But good point, the intent of this function is to do exactly what you do, add the units back in after creating or modifying the table.
Two gotchas:
Many of the comments in desidatamodel column_descriptions.csv are longer than fit in FITS due to the 80 column limit per line resulting in truncated comments like
TTYPE3 = 'SUBPRIORITY' / Random subpriority [0-1) to break assignment ti
In some cases we could reasonably trim down the comment to fit without losing info; in other cases the description needs enough caveats that we may need to retain the longer comment for the desidatamodel documentation and support a "ShortDescription" column with an abbreviated version, e.g. "Declination" instead of "Barycentric ICRS Declination tied to Gaia DR2 blah blah"
FITS doesn't formally support nanomaggies, and astropy.io.fits.convenience.table_to_hdu
is picky about that:
from desiutil.annotate import annotate_table
from astropy.table import Table
from astropy.io import fits
t = Table() t['RA'] = np.arange(5) t['FLUX'] = np.arange(5)
annotate_table(t, units=dict(RA='deg', FLUX='nanomaggy'), inplace=True) print(t)
hdu = fits.convenience.table_to_hdu(t) print(hdu.header)
Results:
RA FLUX deg nmgy
0 0
1 1
...
WARNING: The unit 'nmgy' could not be saved in native FITS format and cannot
be recovered in reading. It can roundtrip within astropy by using QTable both
to write and read back, though one has to enable the unit before reading.
[astropy.io.fits.convenience]
XTENSION= 'BINTABLE' / binary table extension
BITPIX = 8 / array data type
NAXIS = 2 / number of array dimensions
NAXIS1 = 16 / length of dimension 1
NAXIS2 = 5 / length of dimension 2
PCOUNT = 0 / number of group parameters
GCOUNT = 1 / number of groups
TFIELDS = 2 / number of table fields
TTYPE1 = 'RA '
TFORM1 = 'K '
TUNIT1 = 'deg '
TTYPE2 = 'FLUX '
TFORM2 = 'K '
END
Note that `annotate_table` successfully added nanomaggy -> nmgy to the FLUX column, but `astropy.io.fits.convenience.table_to_hdu` stripped it back out and the resulting HDU does not have a TUNIT2 for the TTYPE1='RA' column.
One the plus side, `desispec.io.util.write_bintable(filename, table, units=dict_of_units)` works since it adds the TUNITn keywords "by hand" based upon dict_of_units[colname] without checking that they are valid FITS units (i.e. let the user violate the standard if they want to or don't otherwise know, which could be considered a feature or a bug...)
Right, I already made a note about long comments.
It looks like we have a convenient clue in the warning. We use QTable instead of Table, and pre-define the unit, as suggested in https://docs.astropy.org/en/stable/units/combining_and_defining.html.
Some notes on defining units:
>>> import astropy.units as u
>>> maggy = u.def_unit('maggy', 3631.0 * u.Jy)
>>> maggy
Unit("maggy")
>>> 1.0 * maggy
<Quantity 1. maggy>
>>> (1.0 * maggy).to(u.Jy)
<Quantity 3631. Jy>
>>> nanomaggy = u.def_unit('nanomaggy', 1.0e-9 * maggy)
>>> (1.0 * nanomaggy).to(u.Jy)
<Quantity 3.631e-06 Jy>
>>> u.Magnitude(1.0 * maggy)
<Magnitude -0. mag(maggy)>
>>> u.Magnitude(1.0 * nanomaggy)
<Magnitude -0. mag(nanomaggy)>
One would need to similarly account for all the spelling variations: maggie
, maggies
, nmgy
, nanomaggies
, etc.
When reading a redshift catalog with nanomaggy units, astropy.table.Table
complains with:
WARNING: UnitsWarning: 'nanomaggy' did not parse as fits unit:
At col 0, Unit 'nanomaggy' not supported by the FITS standard.
If this is meant to be a custom unit, define it with 'u.def_unit'.
To have it recognized inside a file reader or other code, enable
it with 'u.add_enabled_units'. For details, see
https://docs.astropy.org/en/latest/units/combining_and_defining.html
[astropy.units.core]
I have not succeeded in following those instructions to eliminate the message, however.
Attempt 1, define "nanomaggy":
import astropy.units as u
nanomaggy = u.def_unit('nanomaggy', 1.0e-9 * 3631*u.Jy)
u.add_enabled_units( [nanomaggy,] )
Result:
ValueError: Object with name 'nanomaggy' already exists in namespace.
Filter the set of units to avoid name clashes before enabling them.
Attempt 2, use the "nanomaggy" already defined in astropy:
import astropy.units as u
u.add_enabled_units( [u.Unit('nanomaggy'),] )
from astropy.table import Table
zcat = Table.read('/pscratch/sd/s/sjbailey/desi/dev/zcat_options/ztile-sv2-dark-cumulative-main.fits')
Result, same warning message as before:
WARNING: UnitsWarning: 'nanomaggy' did not parse as fits unit:
At col 0, Unit 'nanomaggy' not supported by the FITS standard.
If this is meant to be a custom unit, define it with 'u.def_unit'.
To have it recognized inside a file reader or other code, enable
it with 'u.add_enabled_units'. For details, see
https://docs.astropy.org/en/latest/units/combining_and_defining.html
[astropy.units.core]
i.e. astropy.units knows about nanomaggy, but how do we get the astropy.io.fits reader to acknowledge that?
Indeed, maggy
is defined in astropy.units.photometric
. However, it is nevertheless not part of the FITS standard. That is what the first two lines of the warning say.
I don't know why the second part of the warning is printed though, and we may have to create an issue on astropy to explore that.
Yeah, (nano)maggy is not part of the FITS standard, but the second part of the error message implies that there is some way of registering the unit to have it recognized as valid anyway, which I thought would silence the warning.
As a brute force alternative, elsewhere in desispec.io we've used code like warnings.filterwarnings('ignore', message="'.*nanomaggies.* did not parse as fits unit.*")
to silence warnings that we didn't care about.
Here's the problem, it's built into the mechanism to parse a unit as a FITS unit, and therefore the message is actually misleading.
In other words, since maggy
is not in the FITS standard, it can't be parsed as a FITS unit, even though it is defined in other unit systems. But the error message doesn't know about that.
So I think this should be reported upstream, I'll try to take care of that later today. Filtering the warning, since it is clearly misleading, may be the best option.
@sbailey, regarding your code snippet above. If I try to reproduce it, I do get the same warning, but the table is read and the column does have units of nanomaggy
. So is there something I'm missing, or do you just not want to see the warning?
Yes, the table is read and the units are fine. The only problem is the extraneous verbose warning message.
One of our standard VAC recommendations is to add units to table columns. It would be helpful if we had an easy tool to do this to help users do the right thing. Below are suggested interfaces to implement to streamline adding units to a FITS table post-facto.
function interface
Or maybe split out
strictunits
vs.strictcolnames
instead of using a singlestrict
for bothcommand line
Wrap the above function in a command line script to post-facto update the units in a FITS file on disk:
update_table_units --input/-i FILENAME [--extname/-e EXTNAME] --units COLNAME1=UNITS1 COLNAME2=UNITS2 ...
--extname is optional only if the file contains a single table in HDU 1.
doing it right the first time
It would also be helpful if we had cookbook documentation about how to add units in the original code writing the tables when the source data is an astropy Table, a numpy structured array, or a Pandas dataframe. The spectro pipeline does this with
desispec.io.util.write_bintable
, which is not particularly simple and not ideal as a dependency for science codes writing tables if they don't otherwise have a dependency on desispec. Do we have simpler recipes to recommend?Bonus points: comments
It would also be nice if the same utilities could easily add comments to table columns (like
write_bintable
) so that the files are more naturally self-documenting.