esheldon / fitsio

A python package for FITS input/output wrapping cfitsio
GNU General Public License v2.0
134 stars 58 forks source link

OverflowError for unsigned long values #124

Open demitri opened 7 years ago

demitri commented 7 years ago

I'm reading a single row from the latest SDSS DR14 spAll file (spAll-v5_10_0.fits) and am getting this error:

In [15]: spAll0 = fitsio.read("spAll-v5_10_0.fits", rows=[0], ext=1)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-15-1ecfb84a9767> in <module>()
----> 1 spAll0 = fitsio.read("spAll-v5_10_0.fits", rows=[0], ext=1)

/usr/local/anaconda/lib/python3.5/site-packages/fitsio/fitslib.py in read(filename, ext, extver, **keys)
     99         item=_make_item(ext, extver=extver)
    100 
--> 101         data = fits[item].read(**keys)
    102         if header:
    103             h = fits[item].read_header()

/usr/local/anaconda/lib/python3.5/site-packages/fitsio/fitslib.py in read(self, **keys)
   1807             if 'rows' in keys:
   1808                 del keys['rows']
-> 1809             data = self.read_rows(rows, **keys)
   1810         else:
   1811             data = self._read_all(**keys)

/usr/local/anaconda/lib/python3.5/site-packages/fitsio/fitslib.py in read_rows(self, rows, **keys)
   1936                                           name,
   1937                                           self._info['colinfo'][colnum]['tscale'],
-> 1938                                           self._info['colinfo'][colnum]['tzero'])
   1939 
   1940         lower=keys.get('lower',False)

/usr/local/anaconda/lib/python3.5/site-packages/fitsio/fitslib.py in _rescale_and_convert_field_inplace(self, array, name, scale, zero)
   2409         numpy boolean values
   2410         """
-> 2411         self._rescale_array(array[name], scale, zero)
   2412         if array[name].dtype==numpy.bool:
   2413             array[name] = self._convert_bool_array(array[name])

/usr/local/anaconda/lib/python3.5/site-packages/fitsio/fitslib.py in _rescale_array(self, array, scale, zero)
   2435             array *= sval
   2436         if zero != 0.0:
-> 2437             zval=numpy.array(zero,dtype=array.dtype)
   2438             array += zval
   2439 

OverflowError: Python int too large to convert to C long

I ran into this problem in Nightlight as well. The problem can be reduced to reading this cell in the table:

fitsio.read(filename="spAll-v5_10_0.fits", rows=[0], columns=["SPECOBJID"], ext=1)

These are the header values for the relevant column:

TTYPE98 = SPECOBJID
TFORM98 = K
TSCAL98 = 1
TZERO98 = 9223372036854775808

The physical value should be: 11258999466550599680. The combination of the "K" TFORM value and the TZERO value is how the FITS format allows unsigned longs. The cfitsio library does not currently support unsigned long data types and throws a numeric overflow error. The physical value is calculated as:

value in table * TSCAL + 9223372036854775808 = 11258999466550599680

I've received no detail on when this might be fixed in the underlying library (i.e. not soon). The patch (given to me by Bill Pence) is to set TZERO for the column to 0, read the data from the file via cfitsio as stored in the file, then add "9223372036854775808" to the values yourself.

For your wrapper, you'd probably want to look for those TSCAL and TZERO values and change the mapped C type to "unsigned long" (but only then so as not to waste memory). Note also that this applies to several unsigned int types (see the FITS specification).

This will be an issue for many SDSS DR14 files, which (I'm sure you know) is due next month.

esheldon commented 7 years ago

As you can see I am actually doing the TZERO myself, cfitsio is not doing it.

There is no natural way to do deal with this given that the column is in fact declared as signed 8 byte. This is a giant kludge propagated in the cfitsio code base. It can work there because the user can always cast the type as needed.

However fitsio always assumes that you want to read into the type that is declared in the file; how else can we sensibly interpret the file without user override? Also fitsio internally reserves the right to read the bytes directly for big efficiency gains (hence doing tzero myself), so remapping the type won't work.

A dumb kludge would be for the user to tell fitsio to override the data type, but this would require twice the memory, since we would have to make a new version of the data and copy into it.

demitri commented 7 years ago

I completely agree; the whole thing is hacky.

While the column is declared as a 64-bit long signed integer (TFORM "K"), the FITS spec says that IF the declared type is a signed 64-bit integer AND TZERO = 9223372036854775808 (2^^63), then the column's physical data type is an unsigned 64-bit integer. See Table 11 in the FITS standard. It's a hack, but that's what the specification is. The cfitsio library doesn't help in that it doesn't support it.

However fitsio always assumes that you want to read into the type that is declared in the file; how else can we sensibly interpret the file without user override?

So, given the above, the column is in fact declared (in a completely hacky way) to be an unsigned 64-bit integer. No assumptions or second-guessing need to be made; it's what the file is unambiguously declaring. (A hack, yes, but unambiguous.)

I would request that these circumstances be recognized and the correct data type be applied. From the table referenced above, this applies to signed bytes, and unsigned 16-, 32-, and 64-bit integers. The alternative is that fitsio crashes (as it did for me), and the user having to manually set the data type for each column like this, which isn't really preferable.

In the meantime, is there a way I can read this data with the current version of fitsio by manually specifying the data type? Astropy really doesn't handle reading row by row for a 12GB HDU.

demitri commented 7 years ago

Another quick note - while cfitsio does allow you to specify the data type to read a column into, it will crash with a numeric overflow error reading such a columns internally before it gets to that point.

esheldon commented 7 years ago

I see.

Yes, I agree the data type can be inferred for these types.

No there is no way to do this in fitsio currently.