Closed cgobat closed 9 months ago
Yes, there is a good reason for this. These files are loaded using the mmap()
system call, which maps the contents of the file directly into memory, and together the HDUs hold multiple live KD-tree data structures (codes and stars). Byte-swapping the contents as FITS demands would mean that all the contents have to be byte-swapped upon reading -- pretty painful to implement.
Okay, fair enough! I'm happy to be wrong as long as there's a good reason for it. 😄
Thanks for the response.
The FITS standard mandates the use of big-endian (i.e., MSB-first) byte ordering across the board (see §5). The data in Astrometry.net index files are currently stored in little-endian format, with the FITS data type simply left as naïve byte strings. This not only is unintuitive in light of the FITS standard, it also adds an unnecessary abstraction layer that necessitates extra machine-dependent instructions/documentation, as well as an additional pre-processing step (regardless of one's computer architecture) to convert the bytes into numeric data prior to use. This conversion also cannot even be done entirely programmatically, since the only place the actual data types are described is in the header COMMENTs, meaning FITS reader software doesn't know a priori how to interpret the data without a human setting each type manually.
All of the aforementioned issues can be resolved simply by using the already-existing FITS binary table data type/structure definition keywords to present the data in a FITS-native way. For instance, rather than leaving the
TFORM1
parameter for thequads
HDU simply as16A
(i.e., 16-byte strings/blobs), setting it to4J
(i.e. sets of four 32-bit integers) and swapping the byte order to be FITS-compliant allows FITS I/O programs to read the numeric array directly, and also provides a more faithful representation of the intent/significance of the data. The same principle can be applied to all of the other HDUs in each file.See the attached index-4210-modified.fits.gz for an example of this reformatting. Below is a table summary of the updated HDUs contained therein. I've also added EXTNAMEs for easier identification.
Is there any reason to keep them as-is, as opposed to using FITS-compliant byte order and making them "self-aware" of their own data types?