dstndstn / astrometry.net

Astrometry.net -- automatic recognition of astronomical images
http://astrometry.net
Other
650 stars 184 forks source link

Index files: use FITS-standard byte order and table field formats #284

Closed cgobat closed 9 months ago

cgobat commented 9 months ago

The FITS standard mandates the use of big-endian (i.e., MSB-first) byte ordering across the board (see §5). The data in Astrometry.net index files are currently stored in little-endian format, with the FITS data type simply left as naïve byte strings. This not only is unintuitive in light of the FITS standard, it also adds an unnecessary abstraction layer that necessitates extra machine-dependent instructions/documentation, as well as an additional pre-processing step (regardless of one's computer architecture) to convert the bytes into numeric data prior to use. This conversion also cannot even be done entirely programmatically, since the only place the actual data types are described is in the header COMMENTs, meaning FITS reader software doesn't know a priori how to interpret the data without a human setting each type manually.

All of the aforementioned issues can be resolved simply by using the already-existing FITS binary table data type/structure definition keywords to present the data in a FITS-native way. For instance, rather than leaving the TFORM1 parameter for the quads HDU simply as 16A (i.e., 16-byte strings/blobs), setting it to 4J (i.e. sets of four 32-bit integers) and swapping the byte order to be FITS-compliant allows FITS I/O programs to read the numeric array directly, and also provides a more faithful representation of the intent/significance of the data. The same principle can be applied to all of the other HDUs in each file.

See the attached index-4210-modified.fits.gz for an example of this reformatting. Below is a table summary of the updated HDUs contained therein. I've also added EXTNAMEs for easier identification.

# EXTNAME Type Cards NAXIS2 TFORM1
0 PRIMARY PrimaryHDU 105
1 QUADS BinTableHDU 16 580800 4J
2 KD_HEADER_CODES BinTableHDU 114 0 0A
3 KD_LR_CODES BinTableHDU 20 32768 J
4 KD_SPLIT_CODES BinTableHDU 25 32767 I
5 KD_RANGE_CODES BinTableHDU 32 9 D
6 KD_DATA_CODES BinTableHDU 19 580800 4I
7 KD_HEADER_STARS BinTableHDU 91 0 0A
8 KD_LR_STARS BinTableHDU 20 16384 J
9 KD_SPLIT_STARS BinTableHDU 25 16383 J
10 KD_RANGE_STARS BinTableHDU 31 7 D
11 KD_DATA_STARS BinTableHDU 19 363000 3J
12 SWEEP BinTableHDU 14 363000 1B
13 J_MAG BinTableHDU 15 363000 1E

Is there any reason to keep them as-is, as opposed to using FITS-compliant byte order and making them "self-aware" of their own data types?

dstndstn commented 9 months ago

Yes, there is a good reason for this. These files are loaded using the mmap() system call, which maps the contents of the file directly into memory, and together the HDUs hold multiple live KD-tree data structures (codes and stars). Byte-swapping the contents as FITS demands would mean that all the contents have to be byte-swapped upon reading -- pretty painful to implement.

cgobat commented 9 months ago

Okay, fair enough! I'm happy to be wrong as long as there's a good reason for it. 😄

Thanks for the response.